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1.  INTRODUCTION 


Pancreatic  cancer  remains  one  of  the  most  deadly  cancers  where  the  vast  majority  of  patients  are  diagnosed 
too  late  and  conventional  therapies  have  largely  been  ineffective,  making  early  detection  and  novel  drug 
targets  greatly  needed.  Recent  studies  have  shown  the  expression  of  a  significant  portion  of  genomic  regions 
previously  thought  to  be  transcriptionally  silent.  Satellites  are  regions  of  the  genome  that  are  highly  repetitive 
and  normally  their  expression  is  suppressed  by  heterochromatin,  however,  their  expression  was  found  to  be 
abundant  in  a  wide  variety  of  cancers.  In  particular,  the  HSATII  satellite  was  found  to  be  specifically 
upregulated  in  cancer  cells  compared  to  all  other  repetitive  elements  and  was  found  to  be  highly  elevated  in 
preneoplastic  lesions  of  the  pancreas,  which  has  implications  as  a  novel  early  detection  biomarker.  Moreover, 
the  ability  to  target  HSATII  in  cancers  may  offer  a  novel  therapeutic  avenue.  The  goal  of  this  research  is  to 
understand  the  cellular  and  molecular  impact  of  satellite  RNA  in  cancer  cells  and  to  test  the  utility  of  these 
highly  specific  and  abundant  transcripts  as  novel  biomarkers  for  early  detection. 


2.  KEYWORDS 

cancer  genetics,  satellite  repeats,  metastasis,  circulating  tumor  cell,  pancreatic  cancer 


3.  ACCOMPLISHMENTS 

Aim  1:  Evaluation  of  Satellite  expression  on  transcriptional  profiles 

Task  1 .  Development  of  satellite  expressing  cell  lines:  A  doxycycline  inducible 
HSATII  vector  was  created  containing  a  segment  of  HSATII  that  was 
approximately  800  bp  in  length.  This  vector  was  successfully  transduced  into 
human  cancer  cell  line  SW620.  Induction  of  HSATII  expression  was  clearly 
evident  by  the  addition  of  doxycycline  and  confirmed  by  RNA-ISH  (Fig.  1)  and 
northern  blot. 

Task  2.  Effects  of  satellite  on  expression  patterns:  Satellite  induction  was 
performed  with  doxycycline  and  cell  lines  were  evaluated  for  expression  pattern 
changes  using  the  Helicos  single  molecule  sequencing  platform.  A  total  of  126 
genes  were  differentially  expressed  in  HSATII  induced  cell  lines  compared  to  GFP 
induced  cell  line  controls.  Gene  ontology  of  these  genes  did  not  identify  any 
major  signaling  pathways  or  other  known  expression  profile  enrichment.  Notably, 
46%  of  these  genes  are  upregulated  in  human  brain  tissue  consistent  with  our 
prior  correlation  of  satellite  expression  to  neural  genes. 


Fig.  1:  SW620  HSATII 
inducible  cell  line  without 
(top)  and  with  doxycycline 
(bottom).  HSATII  RNA- 
ISH  (red)  and  DAPI 
nuclear  stain  (blue) 


Although  forced  over-expression  of  HSATII  did  result  in  transcriptional  anc[  function aj  changes,  we  had  learned 

of  a  way  to  induce  endogenous  HSATH  expression,  which  we  believe  was  mom  physiologic.  We  modified  our 


statement  of  work  to  reflect  this  change. 
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Our  previous  work,  had  shown  that  although  HSATII  was 
highly  expressed  in  tumors,  upon  culturing  cancer  cells  in 
standard  2D  adherent  culture  there  is  complete 
suppression  of  HSATII  (Fig.  2).  Because  of  this  finding, 
we  amended  our  SOW  to  include  a  new  task  3. 

Task  3:  Understanding  satellite  deregulation: 

We  refocused  this  task  to  understand  satellite 
deregulation  in  cell  lines.  We  performed  a  number  of 
perturbations  in  vitro  to  induce  HSATII  expression 
including  hypoxia,  UV  radiation,  demethylation, 
starvation,  and  growth  in  non-adherent  conditions.  Only 
growth  in  non-adherent  conditions  (as  3D  tumor  spheres 
or  in  soft  agar)  was  sufficient  to  induce  HSATII  expression 
in  multiple  cancer  cell  lines  including  pancreatic  and  colon 


Fig.  2:  HSATII  induction  from  growth  as  tumors  or 
tumorspheres.  Northern  blot  of  A)  HCT 116  and 
SW620  cancer  cell  lines  grown  in  2D  and  Xenografts. 
B)  Panel  of  cancer  cell  lines  grown  in  2D  or  3D 
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cancers  (Fig.  3).  We  have  completed  RNA-seq  analysis  of  2D  and  3D  cells  paired  RNA-seq  and  are  continuing 
to  analyze  this  data.  We  initially  planned  on  performing  targeted  ChIP-seq,  but  are  awaiting  complete  analysis 
of  the  RNA-seq  data  before  pursuing  this  line  of  work.  In  summary,  we  have  found  an  interesting  phenomenon 
where  HSATII  expression  is  dynamically  controlled  by  growth  in  adherent/non-adherent  conditions.  This 
allows  for  a  model  tool  to  study  the  regulation  of  HSATII  and  also  offers  a  means  to  understand  the  impact  of 
endogenous  HSATII  expression  in  cancer  cells. 

Aim  2:  Evaluation  of  Satellite  expression  on  metastatic  potential 

Task  1 .  Effects  on  adherent  culture:  Effects  of  HSATII  induction  in  HSATII  cell  lines  was  performed  showing 
some  changes  in  morphology  of  the  cell  line  and  increased  migratory  function.  However,  there  were  no 
appreciable  effects  on  cell  growth  and  a  potentially  negative  effect  on  tumor  sphere  formation  with  HSATII 
overexpression.  Our  work  on  Aim  1  Task  3  led  us  to  see  that  HSATII  induction  through  growth  in  non-adherent 
conditions  led  to  expression  of  multiple  transcript  sizes  and  not  a  single  size  as  we  have  done  through  our 
vector. 

When  thinking  about  the  potential  functions  of  HSATII  RNA,  we 
began  to  compare  HSATII  to  other  repeats  in  the  genome. 

Previously,  we  had  found  satellites  to  be  highly  co-expressed 
with  the  LINE-1  element  across  mouse  and  human  cancers. 

LINE-1  is  an  active  retrotransposon  that  inserts  throughout  the 
euchromatin  of  genomes  and  recent  publications  have  shown 
that  LINE-1  retrotransposition  is  a  common  event  across  a  wide 
variety  of  malignancies.  In  addition,  the  telomere  repeat  is  also 
reverse  transcribed  through  telomerase  in  the  vast  majority  of 
cancers.  The  parallels  of  these  two  major  repeats  led  us  to 
hypothesize  that  HSATII  may  be  reverse  transcribed  (RT)  as  a 
means  to  expand  these  regions  in  tumor  genomes.  We 
evaluated  the  presence  of  HSATII  RT  products  by  treating 
xenograft  small  nucleic  acid  extracts  with  DNase  I  and  indeed 
found  sensitivity  of  these  species,  which  indicated  the  presence 
of  dsDNA  (Fig.  3).  Because  of  this  unexpected  finding,  we 
focused  our  efforts  in  validating  this  novel  finding  and  to 
understand  the  significance  of  this  phenomenon  in  cancer 
function.  Since  LINE1  retrotransposition  activity  in  colon  cancer 
had  been  previously  been  shown,  we  used  colon  cancer  as  a 
model  to  best  study  this  novel  reverse  transcriptional  mechanism. 

Through  a  number  of  biochemical  experiments  we  believe 
reverse  transcriptional  machinery  is  highly  active  and  specific  for 
satellite  repeats  in  human  cells.  These  RNA  derived  DNAs 
(rdDNA)  are  found  in  primary  tumors,  xenografts,  and 
tumorspheres  in  large  amounts  and  appear  to  be  used  as 
templates  for  elongating  the  pericentromeric  regions  from  where 
they  originate.  We  validated  the  DNA  expansion  of  HSATII 
regions  in  our  xenograft  models  and  find  50%  of  primary  colon 
cancers  with  significant  copy  number  gains  of  HSATII  as 
determined  by  whole  genome  sequencing.  Importantly,  these 
HSATII  copy  number  gains  were  found  to  confer  a  worse 
prognosis  in  these  cancers  (Fig.  4).  Interestingly,  we  found  that 
certain  HIV  RT  inhibitors  (ddC)  could  block  the  generation  of 
HSATII  DNA/RNA  hybrids,  which  had  anti-cancer  effects  in  3D 
and  in  xenografts  (see  Task  2).  See  attached  publication  in 
press  at  PNAS  for  more  details. 

Task  2.  Effects  on  xenograft  tumors:  Due  to  the  findings  of  differential  expression  of  HSATII  in  2D  culture 
compared  to  3D  and  xenografts,  we  focused  on  these  differences  as  a  means  to  understand  HSATII  function. 


—  HSATII  Gain 

—  HSATII  No  Gain 
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Fig.  4:  Kaplan  Meier  survival  analysis 
of  colon  cancer  TCGA  data.  HSATII 
copy  number  gain  (red)  and  no  gain 
(blue).  Log-rank  p-value  shown. 
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Fig.  3:  A)  Northern  blot  analysis  for 
HSATII  in  colon  cancer  xenografts 
treated  with  DNase  I  showing  partial 
digestion.  B)  Model  of  HSATII  RT  and 
predicted  sensitivity  to  nucleases. 
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Given  the  gains  of  HSATII  copy 
number  from  the  RT  process,  we 
hypothesized  that  blockage  of  HSATII 
RT  may  have  differential  effects  in  3D 
and  xenografts  compared  to  2D 
culture.  Indeed,  we  find  that  using  the 
RT  inhibitor  ddC  or  using  HSATII 
sequence  specific  locked  nucleic 
acids  did  have  an  anti-cancer  effect 
both  in  3D  tumor  spheres  and  in 
xenografts  (Fig.  5).  Together  this  has 
revealed  a  new  therapeutic 
vulnerability  in  cancer  that  can 
potentially  be  rapidly  translated  to  the 
clinic  given  the  utility  of  established 
drugs  for  HIV. 
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Task  3.  Effects  on  CTCs:  Due  to  our 
findings  above  we  have  deferred 
looking  at  the  effects  of  HSATII  on 
CTCs.  However,  we  note  that  the 
disruption  of  HSATII  RT  affects  tumor 
sphere  formation,  which  we  had 
previously  shown  to  be  a  critical 
surrogate  functional  assay  for  CTC 
viability  (Yu  M*,  Ting  DT*,  et  al. 

Nature  2012).  Recent,  analysis  of  mouse  pancreatic  CTC  data  has  identified  elevated  repeat  expression  in 
CTCs  compared  to  matched  primary  tumors  pointing  towards  a  correlation  of  increased  satellite  expression  in 
CTCs  compared  to  primary  tumor  cells.  Human  HSATII  expression  in  CTCs  (See  Aim  3)  has  also  shown 
increased  detection  sensitivity  of  CTCs  again  pointing  towards  a  relationship  of  satellite  expression  to  the 
metastatic  process.  We  plan  to  formally  evaluate  effects  on  CTCs  in  the  future  using  the  pancreatic 
genetically  engineered  mouse  model,  which  was  outside  the  scope  of  this  grant. 


Fig.  5:  NRTI  ddC  differential  effect  in  SW620  cell  line  grown  in  (A)  2D 
vs  (B)  3D.  NRTI  ddC  with  significant  reduction  in  SW620  xenograft 
(C)  proliferation  and  (D)  genomic  HSATII  copy  number  gain 


Aim  3:  Evaluating  Satellites  as  a  novel  CTC  Biomarker 


Taskl.  Optimization  of  RNA-ISH  assay  for  HSATII  in 
CTCs:  HRPO  approval  and  initiation  of  testing  HSATII 
ISH  in  clinical  samples  has  been  done  over  the  last 
year.  Initial  testing  of  RNA-ISH  on  the  3rd  generation 
IFD  CTC-chip  has  been  completed.  Automated  imaging 
analysis  has  been  optimized  and  standard  operating 
procedures  have  been  established  for  the  CTC 
diagnostic. 

Task  2.  Comparative  analysis  of  HSATII  RNA-ISH 
versus  CK/EpCAM  Immunofluorescence  CTC 

enumeration  assays: 

We  have  initiated  a  comparison  of  the  standard  CK 
immunofluorescence  (IF)  to  HSATII  ISH  in  a  split  blood 
sample  run  on  the  IFD  CTC-chip  in  both  resectable 
pancreatic  cancer  and  patients  with  cystic  lesions  of  the 
pancreas  called  intraductal  papillary  mucinous 
neoplasms  (IPMN).  IPMNs  are  often  found  incidentally 
and  only  approximately  5-10%  will  progress  to  invasive 
cancer.  Understanding  which  patients  require  definitive 
surgical  resection  and  ones  that  can  be  monitored  is  of 
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Fig.  6:  CTCs  identified  by  CK  and  EpCAM  IF  (Red) 
or  HSATII  and  KRT  RNA-ISH  (Blue)  from  IPMN 
patients 
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great  importance  for  the  field.  CTCs  may  provide  a  means  to  predict  which  IPMN  patients  may  be  at  risk  of 
invasive  disease.  CTC  counts  with  standard  CK+EpCAM  compared  to  HSATII+KRT  RNA-ISH  show 
comparable  sensitivity  in  IPMN  patients  tested  (Fig.  6).  Both  IF  and  RNA-ISH  performed  well  in  this  pilot  with 
RNA-ISH  demonstrating  a  higher  sensitivity  of  74%  compared  to  50%  using  IF.  We  are  continuing  to  monitor 
these  patients  for  the  development  of  pancreatic  cancer  to  see  which  marker  or  combination  of  markers  is 
most  prognostic  in  this  at-risk  patient  population.  In  parallel,  we  have  also  completed  RNA-sequencing  of  CTC 
samples  from  patients  with  IPMN  and  resectable  pancreatic  cancer  to  define  the  transcriptional  changes 
(coding  and  non-coding)  that  occur  between  preneoplastic  and  invasive  carcinoma  CTCs. 


What  opportunities  for  training  and  professional  development  has  the  project  provided? 

Nothing  to  Report 

How  were  the  results  disseminated  to  communities  of  interest? 

Nothing  to  Report 

What  do  you  plan  to  do  during  the  next  reporting  period  to  accomplish  the  goals? 

Nothing  to  Report 

4.  IMPACT 

The  massive  expression  of  satellite  repeats  in  virtually  all  epithelial  cancers  was  an  unexpected  finding  with 
implications  as  a  cancer  diagnostic  and  also  as  a  new  unappreciated  phenomenon  in  cancer  biology.  In  our 
pursuit,  to  understand  the  biological  regulation  and  function  of  satellites,  we  have  discovered  a  novel  reverse 
transcriptional  mechanism  that  expands  satellite  repeat  regions  in  the  genome,  which  when  blocked  leads  to  anti¬ 
cancer  effects  in  both  tumor  spheres  and  in  xenografts.  Increased  levels  of  satellites  in  circulating  tumor  cells 
(CTCs)  supports  a  relationship  of  HSATII  with  metastasis  and  more  importantly  this  may  prove  to  be  an  blood 
based  early  detection  diagnostic  for  pancreatic  cancer. 

This  work  has  lec[  to  frvitfu[  collaborations  with  others  as  well as  additional  funding.  A  collaboration  with 
Benjamin  Greenbaum  and  Nina  Bhardwaj  at  MT.  Sinai  Cancer  Center  has  led  to  the  discovery  that  HSATII  RNA 
is  immunostimulatory  to  human  macrophages  (See  attached  manuscript  Tanne  et  al.  PNAS  2015).  This  work 
combined  with  the  work  we  have  done  indicates  that  HSATII  RNA  not  only  has  tumor  cell  specific,  but  also 
microenvironmental  effects  that  are  important  in  cancer  cell  progression.  This  has  led  to  the  formation  of  a  SU2C- 
NSF-V  Foundation  Convergence  team  focused  on  understanding  the  role  of  immunotherapy  in  pancreatic  cancer. 
As  part  of  this  collaborative  research  group,  I  will  be  studying  the  role  of  HSATII  RNA  in  this  context.  In  summary, 
we  have  made  significant  progress  in  understanding  the  mechanistic  underpinnings  of  these  repetitive  elements 
in  cancer,  identified  a  novel  therapeutic  opportunity  in  disrupting  these  repeats,  and  developed  a  new  blood  based 
diagnostic  platform  based  on  HSATII  RNA. 

5.  CHANGES/PROBLEMS 

All  changes  for  this  project  are  detailed  in  section  3.  Accomplishments.  These  were  approved  by  the  grants 
administrator  and  SOW  was  modified  to  reflect  these  changes. 
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Aberrant  transcription  of  the  pericentromeric  human  satellite  II 
(HSATII)  repeat  is  present  in  a  wide  variety  of  epithelial  cancers.  In 
deriving  experimental  systems  to  study  its  deregulation,  we 
observed  that  HSATII  expression  is  induced  in  colon  cancer  cells 
cultured  as  xenografts  or  under  nonadherent  conditions  in  vitro,  but 
it  is  rapidly  lost  in  standard  2D  cultures.  Unexpectedly,  physiological 
induction  of  endogenous  HSATII  RNA,  as  well  as  introduction  of 
synthetic  HSATII  transcripts,  generated  cDNA  intermediates  in  the 
form  of  DNA/RNA  hybrids.  Single  molecule  sequencing  of  tumor 
xenografts  showed  that  HSATII  RNA-derived  DNA  (rdDNA)  mole¬ 
cules  are  stably  incorporated  within  pericentromeric  loci.  Suppres¬ 
sion  of  RT  activity  using  small  molecule  inhibitors  reduced  HSATII 
copy  gain.  Analysis  of  whole-genome  sequencing  data  revealed  that 
HSATII  copy  number  gain  is  a  common  feature  in  primary  human 
colon  tumors  and  is  associated  with  a  lower  overall  survival. 
Together,  our  observations  suggest  that  cancer-associated  dere¬ 
pression  of  specific  repetitive  sequences  can  promote  their  RNA- 
driven  genomic  expansion,  with  potential  implications  on 
pericentromeric  architecture. 

satellites  |  reverse  transcription  |  repeats  |  cancer 

Pericentromeric  satellite  repeats  are  essential  core  centro¬ 
mere-building  elements  that  stabilize  interactions  with  DNA- 
binding  proteins,  maintain  heterochromatin  architecture,  sustain 
kinetochore  formation,  and  drive  chromosomal  segregation  during 
mitosis,  thereby  ensuring  faithful  duplication  of  the  genome  (1). 
Transcription  from  pericentromeric  satellites  has  been  reported  in 
plants  and  invertebrates,  as  well  as  during  early  stages  of  vertebrate 
development,  and  some  types  of  satellite  repeats  are  induced  fol¬ 
lowing  environmental  stress  in  cell  line  models  (2).  We  recently 
reported  the  massive  overexpression  of  specific  classes  of  satellite 
repeats  in  human  epithelial  cancers,  resulting  from  aberrant  tran¬ 
scription  of  these  pericentromeric  domains  (3).  In  almost  all  can¬ 
cers  analyzed,  subsets  of  pericentromeric  satellites  are  expressed  at 
very  high  levels  (3-5),  whereas  others  show  consistently  reduced 
expression  compared  with  normal  tissues. 

The  human  satellite  II  (HSATII)  is  the  most  differentially 
expressed  satellite  subfamily  in  epithelial  cancers  (3).  It  consti¬ 
tutes  the  main  component  of  pericentromeric  heterochromatin 
on  chromosomes  2,  7,  10,  16,  and  22  (UCSC  Genome  Browser, 
genome.ucsc.edu),  and  it  is  also  found  at  chromosome  band 
lql2,  where  it  is  colocated  with  satellite  III  sequences.  HSATII 
is  defined  by  tandemly  repeated  divergent  variants  of  23-  to  26- 
bp  consensus  sequences,  organized  in  long  arrays  that  may  span 
up  to  thousands  of  kilobases  (6).  Although  repetitive  DNA  se¬ 
quences  are  frequently  hypomethylated  in  cancer  cells  (7),  the 
mechanisms  underlying  their  aberrant  expression  are  not  well 
understood.  For  instance,  loss  of  DNA  methylation  alone  does 
not  result  in  overexpression  of  the  HSATII  satellite  (8),  sug¬ 
gesting  the  existence  of  more  complex  regulatory  networks. 


In  establishing  models  to  study  the  molecular  basis  of  HSATII 
RNA  overexpression  in  cancer,  we  found  that  growth  of  cells 
under  nonadherent  conditions  is  sufficient  to  trigger  induction  of 
this  satellite  repeat.  Unexpectedly,  under  these  and  other  condi¬ 
tions,  we  uncovered  that  these  repeated  transcripts  are  reverse- 
transcribed  into  DNA/RNA  hybrids.  The  reintegration  of  HSATII 
RNA-derived  DNA  (rdDNA)  is  correlated  with  a  progressive 
expansion  of  host  HSATII  genomic  loci.  These  results  point  to  an 
unexpected  plasticity  of  pericentromeric  repeat-containing  struc¬ 
tures  during  cancer  progression. 

Results  and  Discussion 

Detection  of  HSATII  Expression  in  Human  Tumors  and  3D  Cancer  Cell 
Models.  The  highly  repetitive  nature  of  satellites  precludes  their 
precise  quantitation  and  qualitative  analysis  using  PCR-based 
RNA  sequencing  approaches.  We  previously  showed  that  PCR- 
independent  single  molecule  next-generation  sequencing  [digital 
gene  expression  (DGE)  profiling]  is  uniquely  sensitive  and 
quantitative  (9),  although  it  is  not  suited  to  routine  analysis  and 
does  not  provide  qualitative  length  information.  To  enable  ex¬ 
perimental  models  for  the  study  of  HSATII  deregulation,  we  first 
designed  a  modified  Northern  blot  HSATII  assay  (SI  Appendix, 
Fig.  SL4).  HSATII  satellite  transcripts  encompass  arrays  of 
variable  lengths  derived  from  multiple  different  genomic  loca¬ 
tions  (6);  thus,  Northern  blotting  generates  a  pattern  of  bands 
ranging  from  30  nt  to  greater  than  800  nt  in  size  (SI  Appendix, 
Fig.  SIB),  consistent  with  the  pattern  reported  for  other  repeats, 


Significance 

Unique  among  the  large  number  of  noncoding  RNA  species,  the 
pericentromeric  human  satellite  II  (HSATII)  repeat  is  massively 
expressed  in  a  broad  set  of  epithelial  cancers  but  is  nearly  un¬ 
detectable  in  normal  tissues.  Here,  we  show  that  deregulation  of 
HSATII  expression  is  tightly  linked  to  growth  under  nonadherent 
conditions,  and  we  uncover  an  unexpected  mechanism  by  which 
HSATII  RNA-derived  DNA  (rdDNA)  leads  to  progressive  elonga¬ 
tion  of  pericentromeric  regions  in  tumors.  The  remarkable  spec¬ 
ificity  of  HSATII  overexpression  in  cancers,  together  with  the 
consequences  of  targeting  its  RT,  points  to  a  potential  novel 
vulnerability  of  cancer  cells. 
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such  as  murine  minor  satellites  (10)  and  human  satellite  III  (11). 
Quantitation  of  DGE  profiles  and  Northern  blot  signal  intensity 
for  matched  primary  cancer  specimens  were  highly  correlated  (SI 
Appendix ,  Fig.  SI  B  and  C). 

We  observed  that  human  colorectal  cancer  cell  lines  do  not 
express  HSATII  under  standard  in  vitro  adherent  (2D)  culture 
conditions,  but  strongly  up-regulate  its  expression  when  grown  as 
tumor  xenografts  (Fig.  L4).  To  define  specific  experimental 
conditions  that  modulate  HSATII  expression  within  tumors,  we 
tested  multiple  stimuli  associated  with  cellular  stress  and  tumori- 
genesis,  including  hypoxia,  UV  irradiation,  heat  shock,  oxidative 
stress,  overconfluence,  treatment  with  demethylating  agents,  coculturing 
with  stromal-derived  feeder  layers,  and  culture  under  anchorage- 
free  conditions  (SI  Appendix ,  Fig.  S2  A-D).  Remarkably,  only 
culture  under  nonadherent  conditions,  as  3D  tumor  spheres  in 
solution  or  in  soft  agar,  led  to  robust  induction  of  HSATII  in  five 
colorectal  cancer  cell  lines,  as  detected  by  Northern  blotting 
(Fig.  IB).  The  specific  induction  of  HSATII  RNA  by  anchorage- 
independent  growth  was  also  demonstrated  using  RNA  in  situ 
hybridization,  an  imaging  assay  that  does  not  involve  nucleic  acid 
extraction  or  denaturation,  confirming  the  RNA  specificity  of  the 
hybridization  signal  (Fig.  1C).  A  sixth  colorectal  cancer  line, 
COLO205,  noteworthy  for  its  semiadherent  growth  pattern,  was 
unique  in  expressing  HSATII  RNA  at  baseline  under  standard 
culture  conditions  (Fig.  IB).  Notably,  the  elevated  RNA  levels 
detected  in  tumor  spheres  and  xenografts  from  two  independent 
cell  lines  were  rapidly  lost  upon  reestablishing  the  growth  of 
these  cells  under  adherent  2D  culture  (SI  Appendix ,  Fig.  S2E). 
Consistent  with  our  previous  findings  in  primary  tumors  (3), 
HSATII  transcripts  were  present  in  both  sense  (S)  and  antisense 
(AS)  orientations  (SI  Appendix ,  Fig.  S2 F),  and  they  were  primarily 
localized  to  the  nuclear  compartment  in  a  panel  of  colon  cancer  cell 
lines  (SI  Appendix ,  Fig.  S2G),  similar  to  other  repetitive  noncoding 
RNAs  such  as  telomeric  repeat-containing  RNA  (TERRA)  (12). 

Previous  studies  have  shown  that  satellite  expression  in  both 
human  and  mouse  cells  may  be  due  to  a  combination  of  envi¬ 
ronmental  stimuli  and  genetic  factors.  For  instance,  human 


Fig.  1.  HSATII  is  expressed  in  human  tumors  and  3D  cancer  cell  models. 
(A)  Northern  blot  analysis  of  HSATII  expression  in  HCT116  and  SW620  cells 
grown  as  2D  cultures  or  xenografts  (Xeno).  ( B )  Northern  blot  analysis  of 
HSATII  expression  in  colon  cancer  cell  lines  grown  as  2D  cultures  or  tumor 
spheres  (3D).  Ethidium  bromide  (Et  Br)  stainings  of  gels  are  shown  for  each 
Northern  blot  analysis  as  loading  controls.  (C)  RNA  in  situ  hybridization  (with 
the  indicated  fluorescent  probes)  of  SW620  cells  cultured  under  2D  condi¬ 
tions  or  as  tumor  spheres  (3D).  HSATII/DAPI  colocalization  coefficient  mea¬ 
sured  by  confocal  imaging:  R  =  0.6  ±  0.04. 


satellite  III  is  expressed  in  response  to  cellular  stress,  including 
UV-C,  oxidative,  heat  shock,  and  hyperosmotic  stress  (13),  whereas 
mouse  pericentromeric  major  satellites  can  be  induced  by  growth 
to  confluence  or  treatment  with  hypomethylating,  apoptotic,  or 
differentiation-inducing  agents  (10).  Genetic  lesions  in  the  tumor 
suppressor  BRCA1  lead  to  Alpha-satellite  sequence  derepression 
in  human  breast  epithelial  cells  (5),  and  deletion  of  Trp53  in  mouse 
cells  results  in  murine  major  satellite  expression  (14).  Remarkably, 
the  HSATII  pericentromeric  repeat  is  resistant  to  general  envi¬ 
ronmental  stressors  and  differences  in  oncogene  or  tumor  sup¬ 
pressor  genotypes  under  standard  adherent  culture.  Only  growth 
under  nonadherent  conditions  is  sufficient  to  induce  robust  HSATII 
expression  across  different  cancer  cell  lines,  an  interesting  phe¬ 
nomenon  that  merits  further  investigation. 

Identification  of  Medium/Small-Molecular-Weight  HSATII  DNA.  Un¬ 
expectedly,  we  observed  that  a  fraction  of  the  xenograft-induced 
HSATII  sequences  present  within  medium/small-molecular- 
weight  nucleic  acids  (extraction  with  Trizol;  ThermoFisher)  was 
sensitive  to  DNase  I  (Fig.  2A).  This  event  may  result  from  ge¬ 
nomic  DNA  (gDNA)  contamination  or  from  the  existence  of 
small  dsDNA,  dsRNA  partially  susceptible  to  DNase  I,  and/or 
DNA/RNA  hybrids.  To  exclude  the  possibility  of  gDNA  con¬ 
tamination,  we  performed  multiple  controls.  First,  during  nucleic 
acid  extraction,  we  applied  a  solid  barrier  to  prevent  any  cross¬ 
contamination  between  the  aqueous  phase  and  the  organic, 
gDNA-containing  phase  (phase  lock  gel  tubes)  (SI Appendix,  Fig. 
S3/4).  Second,  we  showed  that  high-molecular-weight  HSATII 
gDNA  from  a  colon  cancer  xenograft  is  readily  distinguished  by 
Northern  blotting  from  separately  processed  small-molecular- 
weight  RNA/DNA  from  the  same  tissue,  without  evidence  of 
cross-contamination  (SI  Appendix,  Fig.  S3 B).  Third,  using  mul¬ 
tiple  cell  lines,  we  showed  that  the  presence  of  medium/small- 
molecular-weight  HSATII  sequences  is  only  evident  following 
culture  under  3D  or  xenograft  conditions.  Identical  and  simul¬ 
taneous  analysis  of  these  cells  cultured  under  2D  conditions 
showed  no  evidence  of  these  small  HSATII  sequences  (Fig.  1 A 
and  B  and  SI  Appendix,  Fig.  S2).  Finally,  under  identical  pro¬ 
cessing  conditions,  the  rapid  loss  of  HSATII  signal  following 
replating  of  3D  cultures  into  2D  cultures  further  excluded  gDNA 
contamination  as  a  source  for  the  HSATII  DNA  signal  on 
Northern  blotting  (SI  Appendix,  Fig.  S2E).  We  therefore  con¬ 
cluded  that  deregulated  HSATII  satellite  transcripts  coexist  with 
matched  DNA  fragments  within  the  medium/small-molecular- 
weight  nucleic  acid  fraction  of  cells  that  overexpress  this  satellite 
repeat.  Given  the  presence  of  small  RNA  and  DNA  species,  we 
hypothesized  that  HSATII  RNA  may  be  reverse-transcribed  into 
DNA/RNA  hybrids  and,  ultimately,  dsDNA  (SI  Appendix,  Fig. 
S3C),  a  phenomenon  known  to  occur  with  other  repetitive  ele¬ 
ments,  such  as  long  interspersed  nuclear  elements  (LINEs), 
short  interspersed  nuclear  elements  (SINEs),  and  long  terminal 
repeat  (LTR)  retrotransposons  (15). 

Generation  of  DNA/RNA  Hybrids  upon  Ectopic  Introduction  of  HSATII 
RNA.  To  capture  the  HSATII  RNA-to-DNA  conversion,  we  first 
developed  an  assay  to  introduce  synthetically  produced  HSATII 
RNA  generated  by  in  vitro  transcription  (IVT)  directly  into  2D- 
cultured  293T  cells  that  lack  endogenous  HSATII  expression  (SI 
Appendix,  Fig.  S3D).  To  assess  the  formation  of  DNA/RNA 
hybrids  in  cells  transfected  with  single-stranded  IVT  HSATII 
RNA  (HSATII-chrlO),  we  subjected  nucleic  acid  extracts  to 
treatment  with  RNase  H,  which  specifically  digests  the  RNA 
moiety  of  DNA/RNA  hybrids  but  does  not  affect  either  ssRNA 
or  the  DNA  component  of  DNA/RNA  hybrids  (Fig.  2 B).  Indeed, 
RNase  H  treatment  caused  a  strong  reduction  in  the  Northern 
blot  signal  identified  for  the  HSATII  S  (but  not  AS)  sequence 
(Fig.  2C  and  SI  Appendix,  Fig.  S3 E).  Thus,  consistent  with  the 
generation  of  rdDNA,  a  fraction  of  the  IVT-produced  RNA  is 
within  a  complex  with  a  cDNA  strand.  Transfection  of  compa¬ 
rable  amounts  of  IVT  GFP  RNA  produced  the  expected  RNA 
signal  but  showed  no  significant  sensitivity  to  RNase  H  (Fig.  2D), 
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Fig.  2.  Ectopic  HSATII  RNA  gives  rise  to  DNA/RNA  intermediates.  (A)  Northern 
blot  analysis  of  HSATII  in  untreated  (NT)  or  DNase  l-treated  extracts  ob¬ 
tained  from  SW620  xenografts.  Numbers  below  indicate  relative  sig¬ 
nal  quantitation.  ( B )  TRIzol  extracts  obtained  from  293T  cells  24  h  after 
transfection  with  HSATII/GFP  IVT  products  were  subjected  to  RNase  H 
treatment,  followed  by  Northern  blotting  and  hybridization  to  detect  the 
RNA  strand  (S-HSATII)  of  the  hybrid.  (C)  Northern  blot  analysis  of  extracts 
from  293T  cells  either  untransfected  or  transfected  with  IVT  HSATII  or  GFP, 
subjected  to  the  indicated  nuclease  treatment  and  probed  for  HSATII  S. 
(D)  Northern  blot  analysis  of  extracts  from  293T  cells  after  transfection 
with  IVT  GFP,  treated  with  RNase  H  and  probed  for  GFP  S.  Numbers  below 
indicate  relative  signal  quantitation. 


and  introduction  of  GFP  RNA  did  not  lead  to  the  induction  of 
HSATII  satellite  RNA  (Fig.  2 C,  last  two  lanes).  These  findings 
exclude  the  occurrence  of  nonspecific  responses  to  RNA  trans¬ 
fection.  We  conclude  that  introduction  of  purified  S  HSATII 
RNA  generates  a  DNA/RNA  hybrid  in  IVT  RNA-transfected 
cells.  Because  IVT  using  T7  polymerase  relies  on  a  PCR-generated 
DNA  template  as  starting  material,  we  included  multiple  controls 
to  ensure  the  absence  of  DNA  template  contamination  within  the 
IVT  product  itself,  as  well  as  any  genomic  HSATII  sequences  in 
the  cellular  extracts  (SI  Appendix ,  Fig.  S3  F-L).  These  results  in¬ 
dicate  that  ectopically  introduced  single-stranded  HSATII  RNA  is 
capable  of  generating  cDNA  within  transfected  cells. 

To  validate  these  results  further,  we  used  the  S9.6  monoclonal 
antibody,  which  is  highly  specific  in  its  recognition  of  DNA/RNA 
hybrids  (16-19).  We  established  a  DNA/RNA  hybrid  immuno- 
precipitation  (DRIP)  assay  using  quantitative  PCR  (qPCR)  of 


S9.6  immunoprecipitates  (HSATII-chrlO  qPCR),  which  was 
applied  to  nucleic  acids  from  untransfected  or  IVT  HSATII 
RNA-transfected  cells.  Samples  were  first  subjected  to  complete 
DNase  I  digestion  (which  removes  all  dsDNA  but  does  not  affect 
DNA/RNA  hybrids),  followed  by  DRIP  analysis  (Fig.  3 A,  Left). 
HSATII  DNA/RNA  duplexes  were  present  only  in  293T  cells 
transfected  with  HSATII  RNA,  and  treatment  of  extracts  with 
RNase  H  effectively  abolished  immunoprecipitation  of  the  HSATII 
DNA/RNA  hybrids  (Fig.  3 B).  Taken  together,  the  formation  of 
HSATII  RNA/DNA  hybrids  in  a  controlled  IVT  model  and  the 
detection  of  these  species  by  DRIP  are  suggestive  of  RT. 

RT  of  Endogenous  HSATII  RNA  into  DNA/RNA  Hybrids.  To  extend  these 
analyses  to  a  more  physiological  context,  we  assessed  the  presence 
of  endogenous  HSATII  DNA/RNA  hybrids  by  applying  the  DRIP 
assay  to  SW620  colon  cancer  cells  grown  as  3D  tumor  spheres  (Fig. 
3/1,  Right).  RNase  H-sensitive  DNA/RNA  HSATII  hybrids  were 
immunoprecipitated  using  the  DRIP  assay  in  SW620  spheroids  (Fig. 
3C).  DNA/RNA  hybrids  were  also  identified  in  COLO205  semi- 
attached  cell  cultures  (SI  Appendix ,  Fig.  S3 M),  which  are  charac¬ 
terized  by  baseline  expression  of  HSATII  transcripts  (Fig.  16). 

To  evaluate  the  consequences  of  RT  inhibition  on  the  forma¬ 
tion  of  these  hybrids,  we  tested  the  effect  of  the  nucleoside  analog 
reverse  transcriptase  inhibitor  (NRTI)  2',3'-dideoxycytidine  (ddC) 
in  HSATII-expressing  cells  (Fig.  3 A,  Right).  Notably,  ddC  is  very 
poorly  incorporated  by  replicative  polymerases  (20,  21),  although 
it  displays  high  specificity  for  multiple  classes  of  RT,  including 
LINE-1  (22).  Treatment  of  SW620  spheroids  and  COLO205  cells 
with  ddC  significantly  reduced  the  levels  of  endogenous  HSATII 
DNA/RNA  hybrids,  as  measured  by  the  DRIP  assay  (Fig.  3C  and 
SI  Appendix,  Fig.  S3 M).  These  observations  are  consistent  with  RT 
activity  in  HSATII-expressing  cells,  contributing  to  the  generation 
of  DNA/RNA  structures  derived  from  satellite  transcripts. 

RT  activity  in  mammalian  cells  is  derived  primarily  from  ret- 
rotransposons,  including  LINE-1,  the  human  endogenous 
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Fig.  3.  DRIP  reveals  the  presence  of  ectopic  as  well  as  endogenous  HSATII 
hybrids  whose  production  is  affected  by  RT  inhibition.  (A)  Outline  of  the 
experimental  layout.  Total  nucleic  acids  (TNAs)  were  isolated  from  IVT 
HSATII-transfected  293T  cells  or  SW620  tumor  spheres  cultured  in  the  pres¬ 
ence  of  ddC  or  DMSO  (ddC~).  TNAs  were  treated  with  DNase  I  digestion  to 
remove  all  potential  gDNA  contamination.  DNA/RNA  hybrids  were  then 
purified  by  immunomagnetic  pull-down  using  a  hybrid-specific  antibody, 
and  their  relative  quantities  were  measured  by  HSATII-chrlO  qPCR.  Pre¬ 
treatment  of  TNA  samples  with  RNase  H  as  indicated  demonstrates  abro¬ 
gation  of  DNA/RNA  hybrid  detection.  ( B )  Fold  change  in  the  enrichment  of 
DNA/RNA  hybrids  in  HSATII-transfected  293T  cells  measured  by  qPCR  after 
DRIP.  (C)  Fold  enrichment  of  endogenous  HSATII  DNA/RNA  hybrids  in  SW620 
tumor  spheres  analyzed  by  HSATII-chrlO  qPCR  after  DRIP.  Fold  changes  were 
calculated  based  on  percent  input  values,  and  the  RNase  H-treated  samples 
were  set  at  1.  For  all  charts,  values  represent  the  average  of  three  in¬ 
dependent  experiments  ±  SEM.  *P  <  0.05  (t  test). 
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Fig.  4.  HSATII  rdDNA  is  reintegrated  at  the  same  original  loci  in  the  ge¬ 
nome,  leading  to  pericentromere  elongation  in  colon  cancer  xenografts. 
(A)  DGE  (RNA)  and  copy  number  (gDNA)  analysis  of  satellite  repeats  (HSATII 
and  GSATII)  in  the  indicated  samples  (SW620)  quantitated  by  single  molecule 
sequencing.  ( B )  Representative  HSATII  DNA  FISH  (white  arrowheads)  on 
metaphase  spreads  of  prexenograft  (Pre-xeno)  2D  cultures  and  xenografts 
obtained  from  SW620  cells.  ( Insets )  Enlarged  (1,000x)  HSATII-positive  chro¬ 
mosomes.  CNV  in  SW620  cells  was  assessed  by  qPCR  on  the  HSATII-chr16-1 
locus  (C),  HSATII-chr16-2  locus  (D),  and  chromosome  16q  arm  (E).  Cycle 
threshold  values  for  all  samples  were  normalized  against  (3-actin,  and  DNA 
CNV  is  expressed  relative  to  SW620  cells  before  xenograft  implants,  which 
was  set  at  1  (T2,  T6,  T10  =  1  wk  of  culture  after  the  second,  sixth,  and  10th 
serial  transplants,  respectively).  Error  bars  represent  SD  (n  =  3). 


retrovirus  (HERV)  family  members,  and  human  telomerase  re¬ 
verse  transcriptase  (hTERT).  Definitive  identification  of  the 
cellular  RTs  responsible  for  HSATII  RT  is  complicated  by  the 
diversity  of  the  retrovirally  encoded  enzyme  families  and  the 
limited  reagents  available  for  their  analysis.  Attempts  to  reduce 
HERV  and  LINE-1  levels  in  our  cell  line  models  with  previously 
published  siRNAs  (23)  were  unsuccessful.  Because  tools  to 
evaluate  hTERT  were  readily  available,  we  undertook  RNA 
immunoprecipitation  analysis  of  endogenous  hTERT  from  nu¬ 
clear  extracts  of  SW620  and  HCT116  cell  lines,  followed  by  RT- 
qPCR  for  HSATII  species.  Significant  enrichment  of  HSATII 
RNA  with  hTERT  immunoprecipitates  was  evident,  relative  to 
control  IgG  (SI  Appendix ,  Fig.  S4  A  and  B).  Importantly,  this 


enrichment  was  only  observed  when  the  cancer  cells  were  grown 
as  xenografts,  but  not  when  they  were  cultured  under  standard 
2D  in  vitro  conditions.  As  a  control,  the  coprecipitation  with 
hTERT  of  telomerase  RNA  component  (TERC),  its  primary 
RNA  template  for  telomere  elongation,  was  unaffected  by  growth 
conditions.  Negligible  coprecipitation  with  hTERT  was  observed 
for  an  unrelated  noncoding  RNA  target.  Furthermore,  TERT 
knockdown  using  three  independent  siRNAs  demonstrated  sig¬ 
nificant  reduction,  although  not  complete  depletion,  of  HSATII 
rdDNA  species  (SI Appendix,  Fig.  S4  C  and  D).  Thus,  hTERT  may 
mediate  HSATII  RT,  although  the  contribution  of  other  known 
cellular  RTs  cannot  be  excluded. 

Progressive  Expansion  of  Pericentromeric  Loci  Through  Stable  Reintegration 

of  HSATII  DNA  Sequences.  HSATII-derived  rdDNA  fragments 
within  the  cell  nuclear  fraction  (SI  Appendix ,  Fig.  S2G)  may  ei¬ 
ther  give  rise  to  extrachromosomal  elements  or  be  integrated  at 
chromosomal  loci,  leading  to  stable  expansion  of  HSATII  ge¬ 
nomic  sequences.  By  analogy,  RT  of  LINE-1  transcripts,  fol¬ 
lowed  by  retrotransposition  at  chromosomal  loci,  has  been 
described  in  epithelial  cancers,  including  colon  carcinoma  (24). 
To  address  this  possibility,  we  first  analyzed  the  dynamics  of  global 
HSATII  RNA-  and  DNA-level  changes  using  single  molecule  se¬ 
quencing  in  SW620  colon  cancer  cells  transitioned  from  2D  in  vitro 
culture  conditions  to  growth  as  mouse  xenografts,  and  vice  versa.  As 
expected,  the  number  of  HSATII  RNA  reads  was  minimal  when 
cells  were  cultured  in  2D  conditions,  induced  360-fold  as  the  cells 
gave  rise  to  xenografts  in  mice,  and  then  promptly  down-regulated 
as  xenograft-derived  tumor  cells  were  returned  to  in  vitro  2D  cul¬ 
tures  (Fig.  44,  blue  solid  line).  Remarkably,  total  cellular  HSATII 
DNA  copy  number,  which  was  already  abundant  at  baseline,  in¬ 
creased  by  25 -fold  as  2D-cultured  cells  were  transitioned  to  xeno¬ 
grafts  (taking  into  account  the  multiple-length  variants  of  the 
HSATII  tandem  repeat  unit).  The  amplified  HSATII  DNA  se¬ 
quences  remained  stably  expanded  despite  the  down-regulation  of 
HSATII  RNA  transcripts  when  cells  were  reestablished  under  2D 
culture  conditions  in  vitro  (Fig.  44,  red  solid  line).  As  a  control,  we 
analyzed  gamma  satellite  II  (GSATII),  which  is  structurally  similar 
to  HSATII  but  whose  expression  is  not  deregulated  in  cancer  (3). 
SW620  cells  showed  negligible  GSATII  changes,  in  either  RNA  or 
DNA  content,  as  cells  transitioned  between  2D  in  vitro  and  xe¬ 
nograft  culture  conditions  (Fig.  44).  Notably,  the  cancer-enriched 
human  alpha  satellite  (ALR/Alpha)  and  simple  satellite  repeat 
(CATTC)n  RNAs  were  also  induced  at  high  levels  in  xenografts 
with  conjugate  DNA  copy  number  gains,  suggesting  a  common 
mechanism  of  RT-mediated  genomic  expansion  of  particular  re¬ 
peat  classes  (SI Appendix,  Table  SI). 
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Fig.  5.  Pericentromeric  HSATII  repeats  expand  both 
locally  and  genome-wide  in  primary  human  colon 
cancer  samples.  CNV  analysis  of  HSATII-chrl 6-1 
(A)  and  HSATII-chrl 6-2  (B)  loci  on  the  indicated 
paired  colon  specimens  (n  =  10)  is  shown.  For  each 
sample,  values  were  normalized  for  (3-actin  DNA  and 
corrected  for  chr16q  arm  changes.  Probability  was 
measured  by  the  paired  t  test.  FC,  mean  fold  change. 
(C)  Relative  percentage  of  HSATII  copy  number 
changes  in  colon  tumor/normal  pairs  according  to 
combined  HSATII-chrl 6-1  and  HSATII-chrl 6-2  CNV 
analysis,  including  correction  for  chr16  arm  gains/ 
losses.  (D)  Heat  map  of  whole-genome  sequencing 
data  on  the  indicated  primary  colon  cancer  specimens 
based  on  a  log2  ratio  cutoff  of  0.1.  (E)  Kaplan-Meier 
curve  of  overall  survival  (days)  of  patients  with  pri¬ 
mary  colon  cancers  with  HSATII  CNV  gain  or  no  gain. 
P  =  0.034  (log-rank  P  value). 


4  of  6  |  www.pnas.org/cgi/doi/10.1073/pnas.1518008112 


Bersani  et  al. 


To  address  the  localization  of  the  amplified  HSATII  DNA 
sequences,  we  performed  HSATII  DNA  FISH  analysis  of  40 
xenograft-derived  metaphase  spreads  that  did  not  reveal  de¬ 
tectable  extrachromosomal  elements,  and  no  hybridization  signal 
was  visible  outside  the  five  chromosomal  loci  known  to  harbor 
long  arrays  of  pericentromeric  HSATII  (Fig.  4 B).  We  could  not 
discern  increased  fluorescence  intensity  or  size  of  hybridization 
signal  to  demonstrate  local  expansion  due  to  the  limitations  of 
this  assay.  However,  consistent  with  the  FISH  data,  alignment  of 
HSATII  gDNA  reads  obtained  by  single  molecule  sequencing 
from  SW620  xenografts  showed  that  the  additional  HSATII  se¬ 
quences  were  distributed  among  the  various  endogenous  preex¬ 
isting  HSATII  pericentromeric  loci  {SI  Appendix ,  Fig.  S5  A 
and  B). 

To  model  HSATII  DNA  copy  gain  over  time  and  as  a  function 
of  tumor  progression,  we  serially  transplanted  SW620  cells  as 
xenografts  over  multiple  generations  of  mice.  Progressive  am¬ 
plification  of  HSATII  gDNA  was  evident  over  10  successive 
rounds  of  in  vivo  tumor  initiation,  as  measured  in  cultures  derived 
from  xenografted  cells.  This  amplification  was  assessed  using  an 
~  170-bp  qPCR-based  copy  number  variation  (CNV)  assay  at  the 
two  highest  density  HSATII  pericentromeric  regions  on  chromo¬ 
some  16q  (HSATII-chrl6-l  and  HSATII-chrl6-2;  Fig.  4  C  andD 
and  SI  Appendix ,  Fig.  S5  C  and  D).  An  adjacent  chromosomal 
region  showed  no  xenograft  transplantation-associated  copy 
number  changes,  ruling  out  nonspecific  gains  in  the  16q  chro¬ 
mosomal  arm  or  in  ploidy  (Fig.  4 E  and  SI  Appendix,  Fig.  S5E).  We 
did  not  observe  a  single  chromosome  locus  duplication  event 
during  tumor  progression  leading  to  a  discrete  increase  in  HSATII 
gDNA.  Instead,  the  pericentromeric  genomic  loci  demonstrated  a 
gradual  increase  in  HSATII  gene  copy  number  over  time,  all 
within  preexisting  satellite  domains.  Such  a  time  line  would  be 
consistent  with  multiple  rdDNA-mediated  reintegration  events. 

Common  Pericentromeric  Expansion  of  HSATII  Repeats  in  Human 
Colorectal  Cancers.  To  determine  whether  HSATII  copy  number 
gains  occur  in  primary  human  colon  cancer,  we  analyzed  CNV  in 
10  pairs  of  primary  tumors  and  their  matched  adjacent  normal 
tissue,  focusing  again  on  the  chromosome  16q  (HSATII-chrl6-l 
and  HSATII-chrl6-2)  loci.  After  correcting  for  chrl6q  arm  loss 
or  gain,  significantly  increased  HSATII  copy  number  was  evident 
at  either  or  both  of  the  two  independent  HSATII  loci  tested  in 
five  of  10  (50%)  colon  cancers  (Fig.  5  A-C  and  SI  Appendix,  Fig. 
S&4).  Among  other  cancers  similarly  analyzed,  HSATII  gene 
copy  gain  was  evident  in  five  of  13  (38%)  kidney  cancers  {SI 
Appendix,  Fig.  S6 B). 

To  extend  our  study  of  HSATII  gene  copy  changes  at  specific 
loci,  we  performed  a  genome -wide  survey  of  all  such  satellite 
repeats  using  a  novel  satellite  CNV  algorithm  to  undertake 
computational  analyses  of  whole-genome  sequencing  from  The 
Cancer  Genome  Atlas  Project  (TCGA).  After  correction  of 
these  data  for  large  genomic  alterations,  comparable  in  size  to 
HSATII  stretches,  we  found  that  in  fully  annotated  genomic 
sequences  of  51  colorectal  cancers  (Dataset  SI),  23  (45%)  had 
statistically  significant  genomic  gain  of  HSATII  compared  with 
their  matched  normal  germ  line  (Fig.  5D).  We  extended  this 
analysis  to  include  additional  satellite  repeats,  which  revealed 
higher  copy  number  gains  in  particular  in  the  satellites  whose  ex¬ 
pression  was  enriched  in  cancers  [ALR/Alpha,  HSATII,  and 
(CATTC)n],  but  not  in  GSATII  and  other  repeats  whose  expres¬ 
sion  is  not  deregulated  in  cancer  cells  (Fig.  5D  and  SI  Appendix, 
Fig.  S6C).  HSATII  copy  gain  co-occurred  more  frequently  with 
ALR/Alpha  and  (CATTC)n,  indicating  a  common  mechanism  for 
repeat  expansion  in  some  groups  of  repeats  that  is  not  shared  with 
others  {SI  Appendix,  Fig.  S6C).  In  addition,  alignments  indicate 
that  repeat  expansions  occur  in  the  same  locations  consistent  with 
our  xenograft  models.  Of  the  51  TCGA  samples,  46  had  annotated 
overall  survival  data  that  we  analyzed  to  compare  tumors  with 
HSATII  gain  vs.  no  gain.  Notably,  Kaplan-Meier  analysis  dem¬ 
onstrated  a  significant  reduction  in  overall  survival  in  the  HSATII 
gain  vs.  no-gain  tumors  (Fig.  5E;  median  overall  survival:  1,096  vs. 


1,881  d;  log-rank  P  value  =  0.034).  Taken  together,  our  data  show 
that  gene  copy  gains  at  HSATII-encoding  pericentromeric  repeats 
and  other  cancer-enriched  satellites  occur  at  preexisting  repeat 
arrays  and  are  a  common  and  negative  prognostic  feature  of  co¬ 
lorectal  cancers. 

The  mechanism  underlying  HSATII  satellite  repeat  expansion 
in  cancer  remains  to  be  defined,  but  it  may  involve  an  RT/ 
reintegration  phenomenon  analogous  to  the  phenomenon  de¬ 
scribed  for  other  major  repetitive  elements,  such  as  LINE-1  (24) 
and  telomeres  (25,  26).  Although  the  RT-directed  expansion  of 
pericentromeric  sequences  has  not  been  described  in  human 
cells,  there  is  ample  precedent  for  retroelement-mediated  in¬ 
tegration  of  such  repeats  in  plants  (27,  28).  In  mammalian  cells, 
integration  of  nonretroelement  sequences  within  centromeres 
has  not  been  reported,  with  the  exception  of  the  marsupial 


Fig.  6.  RT  blockage,  as  well  as  LNA-mediated  inhibition  of  HSATII  tran¬ 
scripts,  affects  tumor  sphere  growth,  impairs  tumorigenesis,  and  prevents 
pericentromeric  copy  number  gains  in  vivo.  (A)  Proliferation  assay  on  DMSO- 
and  ddC-treated  SW620  cells.  Values  (%)  were  normalized  against  the  signal 
derived  from  viable  cells  on  the  day  of  seeding,  which  was  set  at  100%. 
(B)  Tumor  sphere-forming  ability  of  SW620  cells  was  tested  upon  culture  in  the 
presence  of  ddC  for  1 5  d.  Values  (%)  were  normalized  against  the  amount  of 
spheres  in  the  DMSO  control,  which  was  set  at  100%.  *P<  0.05  (t  test).  (C)  In 
vivo  tumor  growth  of  SW620  cell  xenografts.  Mice  were  treated  daily  by  i.p. 
injection  of  25  mg/kg  ddC  or  vehicle  alone,  starting  1  wk  after  tumor  cell 
injection.  Tumor  size  at  this  stage  was  set  at  1  to  calculate  relative  size  fold 
change  over  time.  Error  bars  represent  SEM  (n  =  6).  *P  <  0.05  (ttest).  (D)  CNV 
analysis  of  HSATII  by  qPCR  on  the  chrl 6-1  locus  in  tumors  recovered  from 
untreated  (Vehicle)  or  ddC-treated  mice  (n  =  6).  Values  were  normalized  for 
p-actin  and  expressed  as  HSATII/chr16q  arm  ratios.  *P  <  0.05  (t  test).  (£) 
Proliferation  assay  on  SW620  cells  upon  transfection  with  an  HSATII-specific 
or  Scramble  LNA.  Values  (%)  were  normalized  against  the  signal  derived 
from  viable  cells  1  d  after  transfection,  which  was  set  at  100%.  (E)  Tumor 
sphere-forming  ability  of  SW620  cells  was  tested  upon  transfection  with  an 
HSATII-specific  or  Scramble  LNA.  Values  (%)  were  normalized  against  the 
amount  of  spheres  in  the  control,  which  was  set  at  100%.  *P<  0.05  (t  test).  In 
A,  B,  E,  and  E,  error  bars  represent  SD  (n  =  3). 
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tammar  wallaby,  whose  exceptionally  short  centromeres  harbor 
signatures  of  retroviral  insertions  alongside  domains  of  satellite- 
rich  sequences  (29).  Thus,  by  analogy  with  retroviral  elements, 
LINE-1,  and  telomeres,  pericentromeric  repeats  may  expand 
through  the  activity  of  endogenous  RT  enzymes.  However,  we 
cannot  unequivocally  discriminate  between  candidate  RTs  or 
exclude  other  mechanisms,  including  replication-induced  errors 
or  epigenetic  modifier-induced  site-specific  copy  gain  (30-32). 

Suppression  of  HSATII  RT  Inhibits  Tumor  Growth  and  Impairs  Copy 
Number  Gains.  The  critical  role  of  pericentromeric  repeats  in 
preserving  chromosomal  integrity  is  well  established  (1).  How¬ 
ever,  the  unexpected  RT-associated  mechanism  by  which  peri- 
centromeres  are  expanded  in  cancer  cells  raises  the  possibility 
that  its  disruption  may  affect  tumor  growth.  Given  the  inhibition 
of  HSATII  rdDNA  formation  by  treatment  of  cells  with  the 
nucleoside  analog  ddC,  we  first  tested  the  effect  of  this  NRTI  on 
cell  proliferation.  Treatment  of  SW620  cells  with  ddC  inhibited 
their  proliferation  under  3D  conditions  but  had  minimal  effect  in 
2D  culture  (Fig.  6  A  and  B).  Mouse  tumor  xenografts  generated 
from  SW620  cells  also  showed  sensitivity  to  ddC,  with  a  50% 
reduction  in  tumor  diameter  at  21  d  (P  =  0.03;  Fig.  6C).  The 
antiproliferative  effect  of  ddC  was  accompanied  by  a  reduction 
of  HSATII  copy  gain  in  tumor  xenografts  (Fig.  6 D).  In  a  second 
colon  cancer  cell  line,  HCT116,  the  growth  of  tumor  xenografts 
was  suppressed  using  a  combination  of  two  RT  inhibitors,  ddC 
and  2/,3,-didehydro-2',3/-dideoxythymidine  (d4T),  but  not  using 
ddC  alone,  and  this  effect  was  again  associated  with  a  reduced 
HSATII  copy  number  gain  (SI  Appendix ,  Fig.  S7  A-C).  To  test 
an  alternative  strategy  to  target  HSATII,  we  synthesized  a  locked 
nucleic  acid  (LNA)  oligonucleotide  complementary  to  the 
HSATII  sequence.  Treatment  of  SW620  cells  with  this  HSATII- 
directed  LNA  had  no  effect  when  the  cells  were  grown  in  two 
dimensions,  but  it  had  a  strong  inhibitory  effect  on  tumor  sphere 
formation  (Fig.  6  E  and  F ).  This  effect  was  associated  with  ac¬ 
cumulation  of  cells  in  the  G0/G1  phase  of  the  cell  cycle  (SI 
Appendix ,  Fig.  SID).  Taken  together,  these  results  raise  the 
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intriguing  possibility  that  targeting  HSATII  transcripts  or  sup¬ 
pressing  cellular  reverse  transcriptase  activity  may  selectively 
suppress  proliferation  of  cancer  cells  under  anchorage-in¬ 
dependent  conditions  (SI  Appendix ,  Fig.  S1E).  Beyond  their 
potential  contribution  to  the  viability  of  proliferating  cancer 
cells,  HSATII  transcripts  may  also  play  a  role  in  shaping  tumor- 
host  interactions,  as  shown  in  the  accompanying  paper  (33). 
Thus,  the  disruption  of  HSATII  RT  may  have  direct  effects  on 
cancer  cells,  as  well  as  modulating  the  immune  response  against 
tumor  cells.  Given  the  very  high  frequency  of  HSATII  de¬ 
regulation  in  epithelial  cancers  (3),  such  a  therapeutic  vulnera¬ 
bility  might  have  broad  significance. 

Materials  and  Methods 

Human  normal  and  tumor  tissues  were  deidentified  and  discarded  excess 
tissue  obtained  from  the  Massachusetts  General  Hospital  (MGH)  according  to  an 
MGH  Institutional  Review  Board  (IRB)-approved  protocol  (201 3 PO0 1854).  The  IRB 
determined  consent  was  not  needed  for  this  study.  Total  RNA  from  normal 
human  pancreas  was  purchased  from  Clontech.  The  results  here  are  based,  in 
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Given  that  it  is  not  standard  expression,  we  did  not  deposit  the  sequence  into 
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Further  experimental  details  are  provided  in  SI  Appendix. 
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Recent  studies  have  demonstrated  abundant  transcription  of  a  set  of 
noncoding  RNAs  (ncRNAs)  preferentially  within  tumors  as  opposed  to 
normal  tissue.  Using  an  approach  from  statistical  physics,  we 
quantify  global  transcriptome-wide  motif  use  for  the  first  time,  to  our 
knowledge,  in  human  and  murine  ncRNAs,  determining  that  most 
have  motif  use  consistent  with  the  coding  genome.  However,  an 
outlier  subset  of  tumor-associated  ncRNAs,  typically  of  recent  evolu¬ 
tionary  origin,  has  motif  use  that  is  often  indicative  of  pathogen- 
associated  RNA.  For  instance,  we  show  that  the  tumor-associated 
human  repeat  human  satellite  repeat  II  (HSATII)  is  enriched  in  motifs 
containing  CpG  dinucleotides  in  AU-rich  contexts  that  most  of  the 
human  genome  and  human  adapted  viruses  have  evolved  to  avoid. 
We  demonstrate  that  a  key  subset  of  these  ncRNAs  functions  as  immu¬ 
nostimulatory  "self-agonists"  and  directly  activates  cells  of  the  mono¬ 
nuclear  phagocytic  system  to  produce  proinflammatory  cytokines. 
These  ncRNAs  arise  from  endogenous  repetitive  elements  that  are 
normally  silenced,  yet  are  often  very  highly  expressed  in  cancers.  We 
propose  that  the  innate  response  in  tumors  may  partially  originate 
from  direct  interaction  of  immunogenic  ncRNAs  expressed  in  cancer 
cells  with  innate  pattern  recognition  receptors,  and  thereby  assign  a 
previously  unidentified  danger-associated  function  to  a  set  of  dark 
matter  repetitive  elements.  These  findings  potentially  reconcile  sev¬ 
eral  observations  concerning  the  role  of  ncRNA  expression  in  cancers 
and  their  relationship  to  the  tumor  microenvironment. 

noncoding  RNA  |  genome  evolution  |  cancer  immunology 

The  recent  development  of  total  RNA  sequencing  has  allowed 
a  better  appreciation  of  the  complexity  and  breadth  of  the  en¬ 
tire  transcriptome  (1^1).  Analysis  by  the  Encyclopedia  of  DNA 
Elements  (ENCODE)  consortium  unexpectedly  showed  that  far 
more  of  the  mammalian  genome  than  previously  appreciated  is 
transcribed  into  noncoding  RNA  (ncRNA).  Several  short  ncRNAs 
have  conserved  metabolic  and  regulatory  functions,  and  some  an¬ 
tiviral  properties  have  been  assigned  to  novel  ncRNA  classes,  such 
as  eukaryotic  siRNA,  piRNA  (PlWI-interacting)  RNA,  and  pro¬ 
karyotic  CRISPR  (clustered  regularly  interspaced  short  palindromic 
repeats)  RNA  (5).  In  eukaryotes,  long  noncoding  RNA  (IncRNA), 
such  as  long-intergenic  ncRNA,  has  been  associated  with  tran¬ 
scriptional,  posttranscriptional,  and  epigenetic  regulation  (6,  7). 

It  is  now  evident  that  germ-line  and  cancer  cells  can  have 
atypical  ncRNA  transcription,  including  repetitive  elements  from 
regions  usually  silenced  in  steady  state  (8,  9).  In  eukaryotes,  tran¬ 
scription  of  endogenous  retroviruses  and  mobile  elements  is  mostly 
repressed  epigenetically  through  processes  such  as  histone  modi¬ 
fication  and  DNA  methylation,  preventing  disruptive  or  dereg- 
ulatory  effects  due  to  integration  into  coding  regions.  In  mammals, 
DNA  methylation  targets  the  cytosine  in  CpG  motifs  to  form 
5-methylcytosine  contributing  to  down-regulation  of  transcription 
for  methylated  sequences  (10).  Epigenetic  regulation  is  strongly 
associated  with  the  developmental  process,  whereas  its  deregula¬ 
tion,  such  as  by  disruption  of  DNA  methylation,  can  be  associated 
with  dedifferentiation  and  carcinogenic  processes  (11,  12). 


In  cancers,  such  as  those  cancers  driven  by  p53  mutations  and 
epigenetic  alterations,  ncRNA  associated  with  repetitive  elements 
can  be  induced  (8,  9).  In  a  study  of  mouse  and  human  epithelial 
malignancies  by  Ting  et  al.  (9),  several  repetitive  elements  ema¬ 
nating  from  genomic  dark  matter  and  often  repressed  in  steady- 
state  conditions,  particularly  in  pericentromeric  repeats,  such  as 
GSAT  (major  satellite)  in  mouse  and  human  satellite  repeat  II 
(HSATII)  in  humans,  were  only  transcribed  in  cancer  cells.  Leonova 
et  al.  (8)  demonstrated  a  strong  induction  of  repetitive  elements 
from  the  mouse  genome  (particularly  GSAT,  Bl,  and  B2),  along 
with  several  other  ncRNAs,  in  cells  bearing  p53  oncogenic  mutations 
and  exposed  to  epigenome-altering  demethylating  agents.  Anoma¬ 
lous  expression  of  the  murine  repetitive  element  GSAT  was  shown 
to  trigger  transcription  of  the  repeat-dependent  activated  IFN  re¬ 
sponse,  which  can  regulate  apoptosis-related  cell  death.  Similarly, 
when  expressed,  endogenous  retroviral  RNA  can  activate  the  innate 
immune  response  via  several  pathways  (13).  Altogether,  these 
studies  suggest  that  certain  ncRNAs  may  also  have  attributes  of 
immunostimulatory  nucleic  acid  sequences. 

We  use  a  set  of  mathematical  tools  originally  developed  to 
analyze  potentially  immunostimulatory  motif  use  in  viral  and  host 
genome  coding  sequences.  These  methods  were  recently  recast  in 
the  language  of  statistical  physics  and  are  extended  here  to  analyze 
ncRNA  motif  use  (14,  15).  We  analyze  for  the  first  time,  to  our 
knowledge,  large-scale  patterns  of  motif  use  in  human  and  murine 


Significance 

Using  an  approach  derived  from  statistical  physics,  we  quantify 
transcriptome-wide  motif  usage  in  human  and  murine  noncoding 
RNAs  (ncRNAs),  determining  that  most  have  motif  usage  consis¬ 
tent  with  the  coding  genome.  However,  an  outlier  subset  of 
tumor-associated  ncRNAs  comprises  repetitive  elements  whose 
motif  usage  patterns  are  more  typically  associated  with  the  ge¬ 
nomes  of  inflammatory  pathogens.  We  demonstrate  that  a  key 
subset  of  these  elements  directly  activates  the  cellular  innate 
immune  response.  We  propose  that  the  innate  response  in  tu¬ 
mors  partially  originates  from  direct  interaction  of  immunogenic 
ncRNAs  preferentially  expressed  in  cancer  cells  with  innate  pat¬ 
tern  recognition  receptors. 
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transcriptomes,  which  we  use  to  find  anomalies  in  ncRNA  ex¬ 
pressed  in  cancer  transcriptomes  (5,  16).  As  a  result,  we  are  able 
to  characterize  features  of  ncRNA  overexpressed  in  cancerous  cells 
relative  to  normal  cells  (8,  9,  17).  Our  analysis  includes  several 
large  datasets  of  functionally  characterized  ncRNA,  in  addition  to 
pseudogenes  and  repetitive  elements,  such  as  satellite  DNA,  en¬ 
dogenous  retroviruses,  and  long  and  short  interspersed  elements. 
We  demonstrate  many  ncRNAs  preferentially  expressed  in  can¬ 
cerous  cells  display  anomalous  motif  use  patterns  compared  with 
the  vast  majority  of  ncRNAs  whose  patterns  of  motif  use  we  show 
to  be  consistent  with  those  patterns  of  motif  use  in  coding  regions. 
Based  on  their  unusual  pattern  of  motif  use  and  differential  ex¬ 
pression  in  cancerous  vs.  normal  cells,  we  predicted  that  HSATII 
and  GSAT  incorporate  immunostimulatory  motifs  in  humans  and 
mice,  respectively.  Remarkably,  we  validate  our  prediction  dem¬ 
onstrating  that  both  directly  stimulate  antigen-presenting  cells  and 
accordingly  label  them  immunostimulatory  ncRNAs  (i-ncRNAs). 

Results 

General  Motif  Use  Patterns  in  IncRNAs.  Using  the  GENCODE  da¬ 
tabase  of  IncRNA  transcripts  from  humans  and  mice  (versions  19 
and  2  for  humans  and  mice,  respectively)  we  calculated  the 
strength  of  statistical  bias  (referred  to  as  a  force)  on  sequence 
motif  use  for  all  contained  IncRNAs  as  described  in  Materials  and 
Methods.  GENCODE  IncRNA  established  a  baseline  of  sequence 
motif  use  expressed  in  a  broad  array  of  cells  and  tissues  so  that  we 
could  compare  these  patterns  of  motif  use  with  those  patterns  of 
motif  use  of  ncRNAs  expressed  in  certain  cancers.  For  each  se¬ 
quence,  we  calculate  the  force  on  all  two-  and  three-nucleotide 
motifs  and  use  Eq.  5  in  Materials  and  Methods  to  calculate  the 
probability  of  observing  a  sequence  with  that  number  of  motifs. 
The  number  of  sequences  in  GENCODE  for  which  a  given  di¬ 
nucleotide  is  aberrantly  expressed  is  illustrated  in  Fig.  L4.  CpG 
dinucleotides  are  vastly  underrepresented,  as  indicated  by  their 
negative  forces  in  SI  Appendix ,  Table  SI.  UpA  dinucleotides  are 
often  underrepresented,  although  to  a  lesser  extent.  As  in  our 
previous  work,  these  patterns  cannot  be  explained  by  nucleotide 
frequencies,  such  as  guanine -cytosine  (GC)  content,  which  are 
accounted  and  normalized  for  in  our  method. 

These  dinucleotide  motif  use  patterns  are  similar  in  human 
and  mouse  genomes  across  the  wide  array  of  cells  and  cell  lines 
contained  in  GENCODE  (2,  3).  Strikingly,  avoidance  of  the  CpG 
and  UpA  dinucleotide  motifs  in  this  dataset  is  stronger  than  in 
coding  regions  {SI Appendix,  Fig.  SI).  One  can  conclude  that  the 
patterns  previously  observed  in  virus  and  host  coding  genes  are 


Fig.  1.  ncRNAs  expressed  in  cancer  differ  from  general  IncRNA  motif  use 
patterns.  (A)  Fraction  of  GENCODE  human  IncRNA  sequences  where  a  motif 
occurs  the  expected  number  of  times  as  defined  by  corresponding  to  a 
probability  greater  than  0.05  (Eq.  5).  ( B )  Fraction  of  GENCODE  IncRNA  se¬ 
quences  in  humans  (Hs)  and  mice  (Mm)  where  CpG  motifs  occur  the  expected 
number  of  times  compared  with  the  CpG  motifs  expressed  in  human  cancerous 
cells  and  mouse  cancer  cell  lines. 


not  due  to  effects  from  coding  regions,  such  as  codon  use  pat¬ 
terns  (18-20).  Rather,  such  constraints  in  coding  regions  likely 
weaken  the  strength  of  a  statistical  bias  that  comes  from  the 
same  underlying  mechanisms.  This  pattern  suggests  selective 
restrictions  on  dinucleotide  frequencies  observed  in  ncRNAs 
preserving  a  function  or  avoiding  a  detrimental  consequence, 
such  as  a  chronic  autoinflammatory  response  that  could  result 
from  presenting  danger-associated  molecular  patterns  (DAMPs). 
Adaptation  of  dinucleotide  motif  use  in  these  elements  over  time 
is  analogous  to  the  viral  mimicry  of  host  patterns  of  sequence 
motif  use  (14,  21).  When  an  avian  influenza  virus  enters  the 
human  population,  one  can  observe  adaptation  to  analogous 
patterns  emerging  over  time  (14,  15,  22,  23).  In  that  case,  mu¬ 
tation  rates  in  influenza  are  very  high,  so  one  can  follow  these 
evolutionary  adaptations  over  far  shorter  time  periods. 

Trinucleotide  motifs  with  significant  forces  are  listed  in  the  SI 
Appendix,  Table  SI,  along  with  dinucleotide  motifs.  Trinucleotide 
motifs  with  significant  forces  acting  on  them  are  conserved  between 
humans  and  mice,  as  was  the  case  for  dinucleotides,  with  the  ex¬ 
ception  of  UAC  and  UAG  (which  are  significant  in  humans  but  less 
so  in  mice).  Except  for  UAG  (chain  termination  codons  used  in 
coding  RNAs),  whenever  a  trinucleotide  motif  is  significantly  en¬ 
hanced  or  avoided  in  humans,  its  reverse  complement  is  also 
significantly  enhanced  or  avoided,  suggesting  avoidance  of  comple¬ 
mentary  motifs.  The  strongest  forces  suppress  CpG  and  CpG-con- 
taining  trinucleotides  particularly  when  an  A  or  U  is  next  to  the  core 
CpG  motif.  These  results  are  consistent  with  the  avoidance  of  CpGs 
in  AU  contexts  observed  in  influenza  viruses  replicating  in  humans 
(15,  22,  23).  Given  the  apparent  bias  against  CpG  and  UpA,  we 
sought  to  determine  if  these  motifs  were  linked.  Pearson  correlation 
between  these  forces  across  all  GENCODE  ncRNA  in  humans  and 
mice  showed  no  correlation  between  CpG  and  UpA  biases  (r  = 
0.0006;  SI  Appendix,  Fig.  S2).  Therefore,  the  forces  on  CpG  and 
UpA  are  likely  independent.  Moreover,  every  significant  trimer 
across  the  GENCODE  is  correlated  to  CpG,  UpA,  or  both.  As  a 
result,  all  significant  trimers  can  be  explained  by  their  CpG  or  UpA 
motif  use. 

Cancer-Enriched  Noncoding  Repeat  RNA  May  Have  Anomalous  Motif 
Use.  Prior  work  revealed  aberrant  expression  of  ncRNA  across  a 
spectrum  of  mouse  and  human  cancers  (8,  9).  These  sequences 
were  found  in  the  Repbase  database  of  human  and  murine  re¬ 
petitive  elements  and  the  Functional  Annotation  of  Mouse 
(FANTOM)  database  of  murine  noncoding  elements  (currently 
NONCODE)  (24,  25).  We  also  found  high  induction  of  GSAT  in  a 
murine  testicular  teratoma  and  liposarcoma  tumor  model  (8,  9)  {SI 
Appendix,  Fig.  S3).  Focusing  on  these  cancer-expressed  repeats,  we 
found  a  surprisingly  significant  enrichment  of  anomalous  motif  use 
patterns  compared  with  other  ncRNAs.  In  the  Repbase  database, 
we  tested  whether  the  bias  on  dinucleotide  and  trinucleotide  motifs 
observed  in  repetitive  element  sequences  fell  outside  the  distribu¬ 
tion  obtained  from  GENCODE  IncRNA.  Remarkably,  we  found 
hundreds  of  sequences  falling  outside  of  this  distribution.  Many 
have  high  use  of  CpG  dinucleotides,  including  a  set  of  endogenous 
viruses  {SI  Appendix,  Table  S2)  recently  implicated  in  the  innate 
immune  response  in  tumors  (13).  We  conclude  that  although  the 
portions  of  the  noncoding  regions  typically  expressed  as  IncRNAs 
have  motif  use  patterns  similar  to  RNA  from  coding  regions,  there 
are  many  genomic  regions  with  atypical  motif  use  that  are  not 
transcribed  in  normal  cells  or  tissues. 

We  use  the  forces  that  quantify  the  strength  of  the  statistical  bias 
on  the  often  underrepresented  CpG  and  UpA  dinucleotides  to 
differentiate  between  ncRNAs  found  preferentially  in  cancerous 
cells  and  the  total  IncRNA  referenced  in  GENCODE  for  humans 
and  mice,  because  these  two  dinucleotides  essentially  account  for 
all  significant  trinucleotide  motifs  in  this  set.  We  use  the  distri¬ 
bution  of  forces  on  CpG  and  UpA  to  define  a  null  hypothesis, 
which  we  approximate  by  a  Gaussian  distribution  (Fig.  2).  Many 
ncRNAs  from  cancerous  cells  are  clearly  outside  the  distribution, 
often  to  a  large  extent.  In  particular,  HSATII,  the  main  ncRNA 
up-regulated  in  human  pancreatic  cancers,  is  far  outside  the  human 
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distribution,  and  GSAT,  the  main  murine  ncRNA  implicated  in 
murine  tumoral  cell  lines,  is  well  outside  the  mouse  distribution. 
Within  our  null  hypothesis,  the  P  values  for  all  ncRNAs  con¬ 
sidered  here  are  less  than  1(T61  for  human  pancreatic  cancer 
data  and  less  than  1CT2  for  murine  cell  line  data. 

Many  of  the  ncRNAs  from  the  studies  of  Leonova  et  al.  (8)  and 
Ting,  et  al.  (9)  are  outliers  of  at  least  three  SDs  with  respect  to  at 
least  one  of  the  significant  motifs  implicated  in  the  previous  sec¬ 
tion,  accounting  for  a  median  of  70.86%  of  the  modulated  Repbase 
RNA  expression  induced  in  pancreatic  cancer,  along  with  even 
higher  percentages  (73.95%  and  85.74%,  respectively)  in  the 
smaller  sets  of  prostate  and  lung  cancers.  HSATII  is  the  most 
differentially  expressed  (by  a  considerable  margin)  in  the  pancreatic 
cancer  data,  and  HSATII  and  BSR  are  the  highest  in  prostate  and 
lung  cancer  data.  In  p53  KO  murine  cell  lines  treated  with  demeth- 
ylation  agents,  around  68  ncRNAs  are  significantly  modulated 

(8) .  Among  those  ncRNAs,  79.03%  of  the  total  expression  comes 
from  outliers  as  defined  above,  with  the  vast  majority  coming  from 
GSAT  and  B2.  Overall,  we  observed  that  repetitive  sequences 
containing  unusual  motif  use  had  varying  degrees  of  conservation. 
However,  the  subset  preferentially  expressed  in  cancerous  cells  and 
tissues  is  encoded  by  sequences  of  more  recent  evolutionary  origin. 
HSATII  and  GSAT  are  only  conserved  back  to  primates  and  mice, 
respectively,  and  21  of  the  22  ncRNAs  from  the  study  of  Ting  et  al. 

(9)  are  conserved  in  humans  and  primates  but  extend  no  further 
back  in  evolution.  Any  function  is  likely  to  be  species-specific. 

ncRNAs  with  Unusual  Motif  Use  Highly  Expressed  in  Cancers  Are 
Immunostimulatory.  Our  analysis  highlights  that  many  ncRNAs 
up-regulated  in  cancer  display  abnormal  nucleotide  motif  use  that 
we  had  previously  related  to  immunogenic  properties  in  viruses. 
The  innate  immune  system  contains  several  effector  cells  that  react 
to  immunogenic  nucleic  acids,  such  as  exogenous  viral  and  bacte¬ 
rial  nucleic  acids,  as  well  as  endogenous  nucleic  acids  that  can  be 
released  upon  cell  death  (6).  Among  those  effectors,  the  mono¬ 
nuclear  phagocytic  system  [macrophages,  monocytes,  and  dendritic 
cells  (DCs)]  contains  key  regulators  of  innate  immune  activation 
and  adaptive  immunity  (26-28).  DCs  efficiently  sense  and  sample 
their  environment  to  integrate  information  and  mount  a  proper 


response,  which  may  be  tolerogenic  or  immunogenic.  To  test 
whether  ncRNA  with  highly  unusual  motif  use  could  be  recognized 
as  a  DAMP  by  some  nucleic  acid-sensing  pattern  recognition  re¬ 
ceptors  (PRRs),  we  studied  the  effect  of  human  HSATII  and 
murine  GSAT  following  transfection  in  human  monocyte-derived 
DCs  (moDCs)  and  murine  bone  marrow-derived  macrophages. 
Liposomal  transfection  was  required  for  stimulation,  whereas  na¬ 
ked  RNA  had  no  effect,  implying  recognition  is  consistent  with 
activation  via  an  endosomal  or  intracellular  sensor  (SI  Appendix , 
Fig.  S4).  The  general  sets  of  recognition  pathways  tested  are  in¬ 
dicated  in  the  SI  Appendix,  Fig.  S5. 

We  generated  different  ncRNAs  by  in  vitro  transcription  using 
minigenes  coding  for  the  two  main  candidate  outliers  computa¬ 
tionally  predicted  to  have  immunogenic  motif  use  (HSATII  and 
GSAT).  As  controls,  we  derived  RNA  from  minigenes  encoding 
scrambled  (sc)  versions  with  the  same  nucleotide  content  but  having 
normal  motif  use  (labeled  HSATII-sc  and  GSAT-sc)  and  repetitive 
elements  of  comparable  length  but  having  normal  motif  use  pat¬ 
terns  (RMER16A3  and  UCON38),  as  described  in  SI  Appendix.  In 
human  moDCs,  liposomal  transfection  of  HSATII  induced  signifi¬ 
cant  production  of  IL-6,  IL-12,  and  TNF-alpha  relative  to  both 
endogenous  controls  and  their  scrambled  versions  (Fig.  3/4).  A 
similar  profile  of  cytokines  was  elicited  by  moDCs  in  response  to 
selected  Toll-like  receptor  (TLR)  agonists  (SI Appendix,  Fig.  S&4). 
The  candidate  murine  immunogenic  ncRNA,  GSAT,  had  less 
pronounced  immunogenic  properties  but  still  induced  IL-12  (Fig. 
3A).  Upon  liposomal  transfection  of  the  same  ncRNA  into  im¬ 
mortalized  murine  bone  marrow-derived  macrophages  (imBMs), 
the  immunogenic  properties  of  HSATII  were  strongly  attenuated, 
whereas  the  murine  GSAT  induced  high  levels  of  TNF-alpha  (Fig. 
3 B)  and  monocyte  chemotactic  protein  1  (MCP-1),  but  not  IFN- 
gamma,  IL-6,  or  IL-12.  The  imBM  almost  exclusively  regulates 
TNF-alpha  in  response  to  PRR  agonists  (SI  Appendix,  Fig.  S6 B). 

HSATII  and  GSAT  ncRNA  induced  IL-12  in  human  moDCs 
similar  to  the  TLR3  ligand  poly-IC  (a  synthetic  dsRNA  mimic;  SI 
Appendix,  Fig.  S5).  The  absence  of  an  effect  by  ncRNA  with  nor¬ 
mal  motif  use  [i.e.,  the  scrambled  forms  (Fig.  3  A  and  B)\  suggests 
specific  sequence  patterns  within  the  RNA,  such  as  CpG  and  UpA 
motifs,  regulate  immunostimulatory  activity.  Such  motif  use  could 
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Fig.  2.  ncRNA  from  cancer  cells  contains  outliers 
from  normal  motif  use.  Distribution  of  UpA  and  CpG 
bias  in  IncRNA  taken  from  human  tumors  (A)  and 
murine  cell  lines  ( B )  (indicated  in  red)  plotted 
against  IncRNA  from  GENCODE  (indicated  in  gray). 
Each  ellipse  indicates  1  SD  from  the  mean  value  in 
the  GENCODE  dataset.  The  forces  on  CAG  and  CUG 
are  also  shown  for  human  tumors  (C)  and  murine 
cell  lines  (D). 
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Fig.  3.  i-ncRNA  stimulates  human  moDC  cytokine 
production.  Quantification  of  inflammatory  cyto¬ 
kine  production  in  human  moDCs  (A)  and  murine 
imBMs  ( B )  upon  liposomal  transfection  of  human 
i-ncRNA  (HSATII)  and  murine  i-ncRNA  (GSAT)  vs. 
their  scrambled  and  endogenous  controls.  Each 
point  represents  the  mean  value  of  the  experimen¬ 
tal  replicates  for  each  individual  condition;  the  bar 
represents  the  median.  The  significance  of  i-ncRNA 
stimulation  is  analyzed  by  the  nonparametric 
Mann-Whitney  test  to  compare  their  effect  vs.  their 
scrambled  and  endogenous  controls.  NS,  not  signif¬ 
icant.  *P  <  0.05;  **P  <  0.01. 


also  influence  secondary  conformations  that  may  contribute  to 
immunogenic  properties,  although  we  checked  that  the  scrambled 
sequences  did  not  lower  the  RNA  minimum  folding  energy.  Based 
upon  these  observations,  we  refer  to  HSATII  and  GSAT  as  im¬ 
munogenic  ncRNA  or  i-ncRNA  Interestingly,  our  study  corrobo¬ 
rates  previous  findings  by  Leonova  et  al.  (8)  that  ncRNA,  such  as 
GSAT,  can  induce  an  innate  response,  although  in  those  studies, 
the  type  I IFN  pathway  was  also  activated.  Our  initial  investigations 
into  this  pathway  were  inconclusive  (SI  Appendix ,  Fig.  S6C). 

Dissection  of  the  Immunostimulatory  Properties  of  i-ncRNA.  Pathogen- 
associated  molecular  patterns  and  DAMPs  activate  innate  im¬ 
mune  cells  through  PRRs.  To  characterize  better  the  mechanisms 
involved  in  sensing  i-ncRNA,  we  studied  the  immunomodulatory 
properties  of  HSATII  and  GSAT  on  a  panel  of  imBMs  that  lack 
specific  PRRs  or  effector  molecules  in  their  downstream  signaling 
pathways  (SI  Appendix,  Fig.  S5).  Whereas  GSAT  induced  a  TNF- 
alpha  response,  HSATII  did  not  induce  differential  cytokine  ex¬ 
pression  in  these  immortalized  cells,  indicating  that  there  is  either 
a  species-specific  effect,  because  the  cells  are  murine,  or  a  cell 
type-specific  effect,  because  these  cells  are  macrophages.  This 
result  is  perhaps  unsurprising,  because  different  species  and  cell 
types  express  different  PRRs,  and  HSATII  and  GSAT  have 


different  sequence  compositions.  Significantly,  the  absence  of  two 
key  adaptor  and  regulatory  proteins,  MYD88  and  UNC93B1: 
UNC93B3d  (UNC93b),  respectively,  eliminated  the  differential  re¬ 
sponse  to  GSAT  in  imBMs  (Fig.  4). 

MYD88  is  a  key  cytosolic  adaptor  protein  that  is  used  by  all 
TLRs  except  TLR3  to  activate  the  transcription  factor  NF-kB. 
Similarly,  the  mutated  form  of  UNC93b  essentially  eliminated 
inflammatory  responses  in  imBMs.  Although  less  well  charac¬ 
terized  than  MYD88,  this  protein  is  known  to  interact  with 
several  endosomal  TLRs  (TLR3,  TLR7,  and  TLR9)  and  has 
been  implicated  in  TLR  trafficking  between  the  endoplasmic 
reticulum  and  endosomes,  and  their  resultant  maturation  (29- 
31).  We  tested  the  requirement  for  TLR3,  TLR7,  and  TLR9, 
which  are  known  to  recognize  dsRNA,  ssRNA,  and  CpG  DNA, 
respectively  (32-34)  (SI  Appendix,  Fig.  S1A  and  S8).  None  of 
these  receptors  were  required  for  GSAT  to  activate  TNF-alpha 
production  from  imBMs.  Additional  pathways  investigated,  in¬ 
cluding  the  stimulator  of  IFN  genes  (STING)  and  inflammasome 
pathways,  are  discussed  in  SI  Appendix  and  did  not  contribute  to 
i-ncRNA  stimulatory  activity.  Altogether,  our  data  are  consistent 
with  a  requirement  for  i-ncRNA  activation  through  signaling 
pathways  that  rely  upon  MYD88  and  UNC93b.  The  precise  re¬ 
ceptor  involved  in  initial  recognition  remains  to  be  determined. 


Fig.  4.  MYD88  and  UNC93b  control  GSAT  i-ncRNA  stimulation.  Genetic  screen  of  the  innate  immune  pathway  related  to  i-ncRNA  function  in  murine  imBMs. 
The  imBM  cells  of  different  genotypes  (WT,  MYD88  KO,  and  UNC93b3d/3d  MUT)  have  been  stimulated  by  liposomal  transfection  (DOTAP  liposomal  trans¬ 
fection  reagent)  of  the  murine  i-ncRNA  (GSAT).  TNF-alpha  (TNF-a)  production  in  the  supernatant  has  been  quantified,  and  each  point  represents  the  mean 
value  of  the  experimental  replicates  for  each  individual  condition;  the  bar  represents  the  median.  *P  <  0.05. 
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Discussion 

There  is  a  surprising  similarity  to  be  drawn  between  foreign  viral 
nucleotide  sequences  and  select  ncRNAs  silent  in  normal  cells,  yet 
transcribed  in  cancer  cells,  activating  innate  immunity  (23, 29,  35-37). 
We  determined  that  ncRNAs  expressed  predominantly  in  normal 
cells  from  humans  and  mice  reflect  patterns  of  nucleotide  sequence 
motif  avoidance,  such  as  underrepresentation  of  CpG-containing 
sequences  and  reduced  UpA,  similar  to  protein-coding  RNA.  Such 
patterns  often  include  a  many-fold  underrepresentation  of  CpG- 
containing  sequences  and  reduced  UpA  motif  use  compared  with 
expected  levels.  However,  the  genome  also  harbors  repetitive  ele¬ 
ments,  which  often  have  abnormal  use  of  CpG  and  UpA  motifs 
compared  with  the  use  of  CpG  and  UpA  motifs  observed  in  RNA 
expressed  in  normal  cells  and  tissues.  Sets  of  these  ncRNAs,  typically 
newer  genome  entries  over  evolutionary  time  scales,  can  be  expressed 
at  very  high  levels  in  cancerous  cells  and  tumors.  As  a  result,  human 
and  mouse  elements  expressed  in  cancer  cells  can  have  different 
sequences  but  can  share  high  CpG  content  and  are  not  generally 
observed  in  the  human  or  mouse  transcriptome  in  normal  cells. 

We  previously  proposed  that  immunostimulatory  and  proin- 
flammatory  properties  of  highly  inflammatory  influenza  and  other 
RNA  viruses  derive,  in  part,  from  RNA  containing  CpGs  in 
AU-rich  contexts,  which  are  avoided  in  RNA  viruses  circulating  in 
humans.  Experimental  evidence  has  supported  this  hypothesis  (23, 
38, 39).  Recently  we  recast  our  analysis  in  the  language  of  statistical 
physics  in  a  way  that  is  theoretically  insightful  and  computationally 
efficient  (15).  In  this  language,  the  evolution  and  optimization  of 
nucleotide  sequence  motifs  are  driven  by  the  interplay  between 
selective  and  entropic  forces.  The  latter  randomize  motif  fre¬ 
quencies  in  a  genome  under  constraints,  whereas  the  former  are 
largely  Darwinian,  optimizing  for  functions  enhancing  viral  repli¬ 
cation  and  spread.  However,  ncRNAs  transcribed  mostly  in  can¬ 
cerous  cells  would  not  be  exposed  to  the  same  selective  and 
entropic  forces  as  coding  RNAs  and  ncRNAs  transcribed  in  nor¬ 
mal  cells.  Based  on  motif  use  patterns,  we  predicted  many  ncRNAs 
may  have  immunogenic  properties,  presenting  DAMPs. 

We  focused  experimentally  on  HSATII  and  murine  GSAT, 
because  they  are  preferentially  and  highly  expressed  in  carcino¬ 
genic  processes  and  exhibit  abnormal  patterns  of  motif  use.  In 
particular,  human  HSATII  is  enriched  in  CpG  motifs  in  AU-rich 
contexts  avoided  in  genomes  of  humans  and  human-adapted 
viruses.  We  demonstrate  that  their  computationally  predicted 
immunogenic  properties  lead  to  the  induction  of  inflammatory 
cytokines  in  human  and  murine  innate  cells  (Fig.  3  A  and  B).  Our 
observations,  together  with  previous  work  by  Leonova  et  al.  (8), 
strongly  suggest  that  these  endogenous  i-ncRNAs  are  recognized 
as  DAMPs  by  cellular  nucleic  acid  PRRs. 

We  identified  a  key  role  for  MYD88  and  UNC93b  as  regu¬ 
lators  of  GSAT  immunogenicity,  but  without  evidence  for  the 
common  endosomal  nucleic  acid  sensors  typically  regulated  by 
UNC93b  or  associated  with  the  MYD88  adaptor  (TLR2,  TLR4, 
TLR7,  and  TLR9).  Our  results  indicate  that  in  the  murine  imBM 
background,  there  is  potent  induction  of  TNF-alpha.  Further 
studies  will  be  required  to  elucidate  whether  TLR13,  which  has 
been  identified  in  murine  cells  and  recognizes  ribosomal  bacte¬ 
rial  and  viral  RNA,  is  involved,  or  whether  there  exist  in¬ 
tracellular  sensors  of  i-ncRNA  associated  with  MYD88  (40-42), 
as  there  are  for  dsDNA  (DHX-9  or  DHS-36)  (43).  Interestingly, 
we  find  alignment  of  GSAT  contains  a  subsequence  conserved  in 
immunogenic  RNA  isolated  from  bacterial  ribosomal  RNA, 
which  specifically  activates  murine  TLR13  (41). 

Activation  of  innate  immune  signaling  can  contribute  to  either 
carcinogenesis  or  antitumoral  immunity.  TLR  signaling  and 
MYD88  have  been  associated  with  tumor  development  (44). 
Given  that  HSATII  and  GSAT  expression  has  been  found  to  be 
pervasive  in  many  tumor  types  and  induces  responses  that  differ  by 
species  or  cell  type,  the  role  of  i-ncRNA  in  tumorigenesis  is  likely 
dependent  on  the  particular  RNA  expressed  and  other  properties 
of  the  tumor  microenvironment.  For  instance,  HSATII  activates 
macrophages  and  monocytes  in  our  study,  suggesting  it  may  be  a 
mechanism  for  attraction  and  retention  of  tumor-associated 


Fig.  5.  Motif  use  in  HSATII  and  GSAT  clusters  with  foreign  RNA.  A  com¬ 
parison  of  the  forces  on  CpG  dinucleotides  is  plotted  against  the  distribution 
of  forces  on  all  GENCODE  IncRNA  relative  to  a  sequences  nucleotide  bias. 
The  force  on  CpG  dinucleotides  for  HSATII  and  GSAT  is  shown  on  the  dis¬ 
tribution,  along  with  the  average  values  for  the  longest  gene  (PB2)  in  human 
influenza  B  and  avian  H5N1  and  all  Escherichia  coli  coding  regions. 

macrophages.  These  macrophages  have  consistently  been  shown 
to  be  a  poor  prognostic  in  cancer,  leading  to  increased  tumori¬ 
genesis,  metastasis,  and  immunoevasion  (45).  Under  this  hypoth¬ 
esis,  HSATII  is  used  by  the  tumor  to  keep  macrophages  in  the 
tumor  microenvironment  while  driving  out  T  cells.  Interestingly, 
the  viral-like  behavior  of  HSATII  transcripts  is  found  not  only  in 
the  immune  response  to  these  elements  but  also  in  their  ability  to 
reverse-transcribe  in  cancer  cells,  akin  to  retroviruses  (46). 

The  i-ncRNA,  not  subject  to  the  same  forces  as  ncRNA  tran¬ 
scribed  in  steady  state,  may  retain  or  evolve  to  mimic  features  of 
foreign  RNA,  as  seen  by  comparing  HSATII  and  GSAT  with  typical 
human  ncRNA  and  foreign  genomic  material  in  Fig.  5  (15,  47). 
Indeed,  HSATII  and  GSAT  cluster  more  closely,  in  terms  of  motif 
use  patterns,  with  bacterial  rather  than  human  RNA.  Such  RNA 
may  have  been  selected  to  identify  and  eliminate  cells  when  their 
epigenetic  state  is  disrupted.  Essentially  self-“junk”  RNA  may  have 
been  maintained  or  may  have  evolved  to  mimic  non-self-pathogen- 
associated  patterns  to  create  a  danger  signal.  We  propose  that  such  a 
mechanism  would  be  a  previously  unidentified  aspect  of  “genetic 
mimicry,”  where  the  host  is,  for  all  practical  purposes,  mimicking 
pathogen-associated  nucleic  acid  patterns.  HSATII  and  GSAT  em¬ 
anate  from  the  pericentromeres,  which  harbor  new  repetitive  ele¬ 
ments  with  no  known  function  (48).  This  region,  unlike  centromeres 
or  regions  critical  for  structure  or  regulation,  may  dynamically  pro¬ 
duce  unusual  repetitive  elements  that  can  adapt  to  a  particular  or¬ 
ganism’s  PRRs.  Our  studies  indicate  that  under  the  “extraordinary” 
circumstances  where  these  repetitive  elements  are  expressed,  they 
could  play  a  critical  role  in  the  regulation  of  immune  responses 
against  cancer. 

Materials  and  Methods 

We  consider  an  RNA  sequence  of  length  L,  hereafter  called  S0,  and  a  motif  m  [a 
series  of  contiguous  nucleotides  (e.g.,  CpG)].  Our  objective  is  to  define  a 
probabilistic  model  over  the  set  of  the  4L  sequences,  S=  (si  s2 . . .  Sj ...  sL),  such 
that  the  average  value  of  the  number,  Nm(S),  of  occurrences  of  the  motif  m  in 
S  coincides  with  the  number,  /Vm(50),  of  occurrences  of  that  motif  in  S0.  To  do 
so,  we  consider  a  random-nucleotide  model,  where  nucleotides  are  in¬ 
dependently  distributed  according  to  the  frequencies  f°(s),  with  s=A,C,G,U, 
found  in  S0.  We  then  introduce  the  weakest  bias  that  allows  us  to  reproduce 
Nm(So )  on  average. 

The  probability  of  a  sequence  5  in  this  least-constrained,  maximum  entropy 
model  is 

P(S\x,  m)  =  =-T-  ft  f°(s,)  exp(x  Nm(S)),  [1] 

W  i=1 

where 
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Zm(x)=  n  f0(si)  exP(x  Nm(S))  [2] 

sequences  5 ,=1 

ensures  the  probability  is  correctly  normalized.  Parameter  x,  referred  to  as  a 
selective  force  (or  just  force)  on  the  motif  m,  introduces  a  statistical  bias  over 
P  (1 5).  The  force  quantifies  the  strength  of  statistical  bias,  which  may  be  due 
to  selection  on  a  motif.  In  the  absence  of  bias  (x  =  0),  the  probability  of  5 
simplifies  to  the  product  of  its  nucleotide  frequencies,  and  the  number  of 
motifs  is  what  one  would  expect  in  a  typical  sequence  with  nucleotide  fre¬ 
quencies  given  by  f°(s).  Positive  values  for  x  push  the  distribution  toward 
sequences  with  Nm(S )  larger  than  what  one  would  expect,  whereas  negative 
values  for  x  favor  sequences  with  a  smaller  Nm(S )  than  expected. 

The  value  of  the  force,  x(50),  is  computed  by  maximizing  the  probability 
P(5o|x,  m)  of  the  sequence  So  overx.  This  calculation  is  equivalent  to  finding 
the  value  of  x  such  that  the  average  number  of  motifs, 

ACM=  E  P(S\x.m)Nm(S)  =  ^^-(x),  [3] 

sequences 5 

equals  Nm(So).  By  scanning  the  sequences  So  in  the  GENCODE  database,  we 
obtain  the  forces  x(S0)  shown  in  Fig.  2. 

The  logarithm  of  the  number  of  sequences  having  Nm(S)  repetitions  of  m 
is  bounded  from  above  by  the  entropy  of  the  random-nucleotide  model;  the 
equality  is  reached  in  the  absence  of  bias  only  (x  =  0).  The  difference  be¬ 
tween  those  entropies  is  the  entropy  cost  corresponding  to  the  constraint  on 
the  average  number  of  occurrences  of  m,  and  is  denoted  by  am.  It  is  the 
Legendre  transform  of  logZm(x)  (Eqs.  2  and  3): 
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°m  =x(S0)  /Vm(S0)-logZm(x(S0)).  [4] 

Efficient  computational  techniques  allow  us  to  calculate  the  sum  over  the  4L 
sequences  in  Eq.  2  in  a  time  growing  only  linearly  with  L. 

Our  aim  is  to  find  anomalous  motif  use  in  a  sequence  where  the  number  of 
motif  occurrences  is  different  from  what  is  expected  by  chance  in  the  random- 
nucleotide  model  (i.e.,  associated  with  a  significant  nonzero  force).  We  ex¬ 
press  the  likelihood  of  observing  the  natural  sequence  S0  with  a  given  motif 
count  as 

P(S°  |m)  =  max  [P(S° |x,  m)]  =  e“"  IJ  f°  (*?)  •  [5] 

This  likelihood  is  therefore  directly  related  to  the  entropic  cost:  The  larger  the 
cost,  the  more  likely  is  the  motif  to  be  statistically  significant. 
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SUPPLEMENTARY  METHODS  AND  EXPERIMENTS 


Design  of  Experimental  Controls.  For  HSATII  and  GSAT  negative  controls  were 
designed  in  two  ways  and  both  negative  controls  were  compared  to  HSATII  and  GSAT 
for  all  experiments.  First,  full  RNA  sequences  of  both  satellites  were  randomly  permuted 
until  scrambled  sequences  were  generated  that  fell  within  one  half  of  a  standard 
deviation  from  the  mean  value  of  the  strength  of  statistical  bias  against  CpG  and  UpA 
dinucleotides  for  humans  and  mice  respectively.  These  sequences  are  denoted  as 
HSATII-sc  and  GSAT-sc.  In  other  words  these  sequences  had  the  same  length  and 
nucleotide  content  as  HSATII  and  GSAT  but  fell  within  the  inner  ellipse  in  Figures  2a 
(HSATII-sc)  and  Figure  2b  (GSAT-sc).  In  addition  we  checked  that  in  both  cases  the 
minimum  RNA  folding  energy  was  not  lowered  during  the  scrambling  process  so  that 
our  permutations  did  not  seem  to  produce  more  RNA  secondary  structure  thereby 
creating  the  possibility  of  innate  immune  stimulation  via  TLR3.  The  free  energy  was 
calculated  using  the  MATLAB  RNAfold  routine  (1 ,2).  We  created  endogenous  negative 
controls  by  searching  Repbase  for  the  repetitive  elements  that  fell  within  one  standard 
deviation  of  the  mean  bias  against  CpG  and  UpA  in  humans  and  mice  but  were  also 
closest  in  length  to  HSATII  and  GSAT.  These  were  UCON38  for  HSATII  and 
RMER16A3  for  GSAT. 

GSAT  RNA  Expression  Level  Detection.  GSAT  RNA  expression  levels  were 
investigated  by  a  custom  Taqman  Assay  in  normal  mouse  tissue  versus  mouse  tumor 
tissue  samples  (Supplementary  Figure  3).  The  tumor  mouse  models  that  were 
investigated  were  a  model  of  testicular  teratoma  (p53-/-  129/Svsl)  and  a  model  of 
liposarcoma  (p53LoxP/LoxP;PtenLoxP/LoxP).  In  all  instances  GSAT  levels  were  increased  in 
the  tumor  samples  as  compared  to  normal  samples  however  to  varying  degrees.  There 
was  no  significant  difference  in  GSAT  levels  between  tumors  arising  in  females  versus 
those  arising  in  males  in  the  liposarcoma  model.  Also  there  was  no  difference  in  GSAT 
levels  in  p53-/-  129/Svsl  that  developed  teratomas  at  a  young  age  (~1  month  old) 
versus  at  an  older  age  (~3-4  months  old)  (3,4). 

i-ncRNA  generation.  Sequences  encoding  for  murine  GSAT  and  human  HSATII  were 
generated  by  custom  gene  synthesis  (Genscript)  and  cloned  into  a  pCDNA3  backbone 
(EcoRI/EcoRV)  that  carries  a  T7  promoter  on  the  +  strand  and  a  SP6  promoter  on  the  - 
strand  (Invitrogen).  Sequences  encoding  for  GSAT-sc,  HSATII-sc,  UCON38  and 
RMER16A3  were  generated  as  minigenes  and  sub-cloned  in  a  pIDT-blue  backbone  with 
a  T7  promoter  on  the  +  strand  and  a  T3  promoter  on  the  -  strand  surrounding  the 
sequence  of  interest  (IDT).  To  produce  high  quality  RNA,  plasmids  were  digested  by  the 
restriction  enzymes  Notl/Ndel  (pCDNA3)  and  ApaLI  (pIDT  blue)  to  isolate  the  fragment 
containing  the  sequence  of  interest  by  gel  purification  (Qiagen).  Then  the  sequences  of 
interest  containing  the  T7  promoter  were  amplified  by  PCR  (Accuprime-PFX  Invitrogen) 
using  the  following  primer  pairs: 

pIDT  blue  -  Forward:  GCGCGTAATACGACTCACTATAGGCGA; 

Reverse:  CG CAAR RAACCCT CACT AAAG GG AACA)  and 


pCDNA.3  -  Forward:  G AAATT AAT ACG ACT CAAT AG G ; 

Reverse  :T  CT  AGCATTT  AGGT  G  ACACT  AT  AG  AAT  AG) . 

PCR  products  were  purified  by  PCR-Cleanup  (Qiagen)  and  controlled  by 
electrophoresis  (0  8%  Agarose  gel).  RNAs  were  generated  by  in-vitro  transcription  using 
the  mMESSAGE  mMACHINE  T7  ultra  kit  (Ambion)  followed  by  a  capping  and  short 
polyA  reaction.  RNAs  were  then  purified  using  RNA-cleanup  (Qiagen)  quantified  using  a 
nanodrop  and  checked  by  electrophoresis  after  denaturation  at  65  C  for  10  minutes  (1 
5%  Agarose  gel). 

Cell  stimulation.  MoDCs  and  imBM  were  both  stimulated  by  i-ncRNA  in  the  same  way. 
The  culturing  of  these  cells  is  described  below.  Briefly  cells  were  plated  in  96  flat  well 
plates  at  200,000  cells  per  well  for  primary  cells  (MoDCs)  and  100,000  cells  per  well  for 
lines  (IMBM).  i-ncRNA  were  transfected  via  liposomes  formed  using  DOTAP  (Roche 
Life  Science)  at  a  ratio  of  lug  DNA  per  6  ul  DOTAP  diluted  in  HBS  following  the  user- 
guide  recommendations.  The  cells  were  stimulated  using  2ug/ml  of  purified  i-ncRNA 
versus  lOug/ml  total  RNA.  To  stimulate  the  TLR4  pathway  we  used  lOOng/ml  Ultrapure 
LPS  (Invivogen)  for  TLR2:  500ng/ml  Pam2CSK4  (Invivogen)  for  TLR3:  2ug/ml  HMW 
PolylC  (Invivogen)  TLR7/8:  lug/ml  CL097  (Invivogen)  and  100  ng/ml  R848  (Invivogen) 
TLR9:  CpG  B-ODN  1 826  3uM  or  STING  CDN  5ug/ml  (Aduro). 

Cell  culture.  Human  moDCs:  Human  monocyte  derived  DCs  were  differentiated  as 
previously  described  (5);  briefly  PBMCs  were  prepared  by  centrifugation  over  Ficoll- 
Hypaque  gradients  (BioWhittaker)  from  healthy  donor  buffy  coats  (New  York  Blood 
Center).  Monocytes  were  isolated  from  PBMCs  by  adherence  and  then  treated  with  100 
U/ml  GM-CSF  (Leukine  Sanofi  Oncology)  and  300  U/ml  IL-4  (RandD)  in  RPMI  plus  5% 
human  AB  serum  (Gemini  Bio  Products).  Differentiation  media  was  renewed  on  day  2 
and  day  4  of  culture.  Mature  moDCs  were  harvested  for  use  on  days  5  to  7.  For  all 
experiments  harvested  DCs  were  washed  and  equilibrated  in  serum-free  X-Vivo  15 
media  (Lonza). 

Murine  imBMs:  Immortalized  macrophages  were  immortalized  by  infecting  bone  marrow 
progenitors  with  oncogenic  v-myc/vraf  expressing  J2  retrovirus  as  previously  described 
(6)  and  differentiated  in  macrophage  differentiated  media  containing  MCSF.  ImBM  were 
maintained  in  10%  FCS  PSN  DMEM  (Gibco).  ImBM  lines  have  been  kindly  provided  by 
several  collaborators  and  also  obtained  from  the  BEI  resource:  ICE  (Caspl/Caspl  1), 
MAVs,  IFN-R,  IRF3-7  (Dr.  K. Fitzgerald  University  of  Massachusetts),  STING  and  their 
rescues  (Dr.  R.  Vance  University  of  California  Berkeley),  Unc93b1  3d/3d  (Dr.  G.  Barton 
University  of  Californina  Berkeley),  TLR  3,  4,  7,  9,  2-9,  2-4,  MYD88,  TRIF,  TRAM,  TRIF- 
TRAM  (BEI  resource  ATCC/NIAID). 

Investigation  of  Type  I  Interferon  Pathway.  To  characterize  whether  this  pathway 
could  be  modulated  in  our  models,  we  evaluated  production  of  type  I  interferon  in 
response  to  stimulation  by  our  i-ncRNA  using  human  and  murine  interferon  stimulated 
response  element  (ISRE)  reporter  cell  lines,  and  monitored  transcriptome  regulation  of  a 
panel  of  immune  genes  related  to  the  interferon  pathway.  Whereas  the  effect  on  the 


inflammatory  response  is  significant  in  terms  of  TNFalpha,  IL-6,  or  IL-12  production,  the 
effect  on  the  type  I  interferon  pathway  was  less  prominent. 

Additional  Pathways  Investigated.  TLR2  or  TLR4  were  not  required,  indicating  the 
observed  effect  was  independent  of  contamination  from  bacterial  products  such  as 
lipoproteins  and  endotoxins  (Supplementary  Figure  8).  TRIF,  TRIF/TRAM,  and 
IRF3/IRF7,  which  participate  down  stream  in  the  signaling  of  TLR3,  TLR4,  and  TLR7, 
were  also  not  obligatory  (Supplementary  Figure  5).  We  did  not  identify  a  role  for 
candidate  molecules  for  sensing  murine  GSAT,  such  sensors  related  to  cGAS-STING 
signaling  or  DEAD  box  RNA  helicases  such  as  RIG-1  and  MDA5  (7-10).  Inflammatory 
responses  to  GSAT  did  not  depend  upon  the  stimulator  of  interferon  genes  (STING), 
which  induces  type  I  interferon  production  when  cells  are  infected  with  intracellular 
pathogens.  RIG-1  (retinoic  acid-inducible  gene  1)  is  a  dsRNA  helicase  enzyme  that 
senses  RNA  viruses  through  activation  of  the  mitochondrial  antiviral-signaling  protein 
(MAVS)  (11-13).  MAVS  deficient  imBMs  failed  to  respond  to  GSAT  stimulation  ruling  out 
a  contribution  of  RIG-1  in  our  i-ncRNA  signaling  (Supplementary  Figure  7B).  Finally  we 
ruled  out  a  role  for  inflammasome  related  pathways  using  ICE-KO  imBM  that  are 
essentially  a  knockout  for  Caspase  1  and  which  carry  an  inactive  mutation  for  Caspase 
11. 
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SUPPLEMENTARY  FIGURES 
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CpG  Forces  in  GENCODE  Human  ncRNAs  UpA  Forces  in  GENCODE  Human  IncRNAs 


Values  of  Forces  on  CpG  Values  of  Forces  on  UpA 


Supplementary  Figure  1.  CpG  and  UpA  Are  Generally  Under-represented  in 
ncRNA.  Histogram  of  forces  (strength  of  statistical  bias)  on  (A)  CpG  and  (B)  UpA  for 
IncRNA  from  the  GENCODE  Human  transcript  database.  These  forces  are  consistent 
with  those  observed  in  mice  and  those  from  coding  regions. 


A 

Least  PCA  for  Significant  Forces 
on  Human  GENCODE  IncRNAs 


Least  PCA  for  Significant  Forces 
on  Murine  GENCODE  IncRNAs 


Supplementary  Figure  2.  Forces  on  CpG  and  UpA  Dinucleotides  Are  Independent. 

Least  principal  components  for  all  significant  forces  on  motifs  for  (A)  human  and  (B) 
mouse  GENCODE  ncRNA.  In  both  cases  CpG  and  UpA  dominantly  project  onto  the  two 
least  axes  of  variation. 
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Supplementary  Figure  3.  GSAT  is  Expressed  in  Mouse  Testicular  Teratoma  and 
Liposarcoma.  Study  of  the  relative  levels  of  expression  of  GSAT  RNA  by  a  custom 
Taqman  Assay  in  normal  murine  tissue  versus  murine  tumor  tissue  samples.  The  tumor 
mouse  models  investigated  were  testicular  (A)  teratoma  and  (B)  liposarcoma  induced 
tumor  in  p53KO  background.  In  all  instances,  GSAT  levels  were  increased  in  the  tumor 
samples  as  compared  to  normal  samples,  to  varying  degrees. 
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Supplementary  Figure  4.  NcRNA  Require  Transfection  to  Induce  Cellular  Innate 
Immune  Responses.  2ug  /ml  of  the  various  ncRNA  (HSAT  II,  HSAT  ll-sc;  GSAT; 
GSAT-sc)  were  used  to  stimulate  human  DCs  in  96  well  plates  with  (DOTAP)  or  without 
(NT)  the  use  of  DOTAP  as  a  gentle  liposomal  transfection  reagent.  In  absence  of 
transfection  reagent  the  ncRNA  were  not  sensed  by  the  DCs  whereas  transfected 
immunogenic  ncRNA  HSAT  II  and  GSAT,  in  addition  to  Poly-IC  and  R848,  were 
properly  sensed  and  induced  a  cellular  inflammatory  response  in  (A)  TNFalpha,  (B)  IL- 
12,  and  (C)  IL-6. 


Supplementary  Figure  5.  Innate  Immune  Sensing  of  Nucleic  Acids.  Summary  of  the 
innate  immune  pathways  involved  in  the  sensing  of  nucleic  acids  which  were 
investigated  in  this  work.  MYD88  and  UNC93b,  highlighted  in  red,  were  directly 
implicated  in  i-ncRNA  sensing. 
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Supplementary  Figure  6.  Human  moDCs  and  Mouse  imBM  Cells  Respond  to 
Common  PAMPs  and  DAMPS.  Quantification  of  inflammatory  cytokine  production  in 
human  moDCs  (A)  and  in  murine  imBM  (B)  upon  stimulation  with  common  PAMPs  or 
DAMPs  known  to  activate  PRR  innate  immune  pathways,  which  are  listed  in  the 
Materials  and  Methods.  Each  point  represents  the  mean  value  of  the  experimental 
replicates  for  each  individual  condition;  the  bar  represents  the  median.  (C)  The 
inflammatory  response  related  to  type  I  IFN  pathway  induction  in  imBM  upon  stimulation 
of  the  PRR  related  innate  immune  pathways  has  been  analyzed  by  qRT-PCR.  The  heat- 
map  represents  the  log  of  the  relative  expression  of  each  gene  based  on  relative 
quantification  analysis  using  the  ddCT  bi-dimensional  normalization  method  (house 
keeping  genes  and  non-stimulated  cells). 
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Supplementary  Figure  7.  Genetic  Screen  of  Innate  Immune  Pathways  Related  to  i- 
ncRNA  Function  in  Murine  imBM.  (A)  imBM  cells  of  different  knockout  genotypes 
related  to  TLR  PRRs  (TLR2-4  dbKO,  TLR3  KO,  TLR4  KO,  TLR7  KO,  TLR9  KO).  (B) 
imBM  cells  of  different  knockout  genotypes  related  to  STING,  inflammasome,  and  MAV 
dependent  helicases  pathways  (STING  KO,  MAV  KO,  ICE  KO);  and  common  innate 
immune  signaling  (TRIF  KO,  TRAM  KO,  IRF3/IRF7  dbKO).  Cells  have  been  stimulated 
by  liposomal  transfection  of  the  murine  i-ncRNA  (GSAT).  The  TNFa  production  in  the 
supernatant  has  been  quantified  and  each  point  represents  the  mean  value  of  the 
experimental  replicates  for  each  individual  condition;  the  bar  represents  the  median. 
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Supplementary  Figure  8.  Stimulation  of  KO  and  Mutant  imBM  with  Common 
PAMPs  and  DAMPS.  Quantification  of  inflammatory  cytokine  production  in  PRR  KO 
imBM  (A)  and  innate  immune  signaling  related  KO  and  mutant  (B)  upon  stimulation  with 
common  PAMPs  or  DAMPs  known  to  activate  PRR  innate  immune  pathways.  Each 
point  represents  the  mean  value  of  the  experimental  replicates  for  each  individual 
condition;  the  bar  represents  the  median. 


SUPPLEMENTARY  TABLES 


Human 

Mouse 

CG 

-1.419 

-1.375 

UA 

-0.604 

-0.548 

ACG 

-1.7586 

-1.6216 

CAG 

0.5534 

0.5612 

CCG 

-1.5095 

-1.3287 

CGA 

-1.8995 

-1.7082 

CGC 

-1.7304 

-1.5525 

CGG 

-1.511 

-1.2629 

CGU 

-1.7833 

-1.6463 

CUG 

0.669 

0.6748 

GCG 

-1.748 

-1.5592 

GUA 

-0.8632 

-0.7451 

UAC 

-0.7368 

-0.6298 

UAG 

-0.733 

-0.592 

UCG 

-1.9391 

-1.7049 

Supplementary  Table  1.  Average  Forces  on  Motifs  are  Similar  Between  Humans 
and  Mice.  Average  force  on  a  given  motif  in  the  Human  and  Mouse  GENCODE  dataset, 
for  IncRNAs  with  length  greater  than  500  nucleotides.  The  forces  are  listed  for  the 
significant  motifs  in  humans.  The  force  is  a  measure  of  the  strength  of  statistical  bias  to 
enhance  or  suppress  a  motif  versus  what  is  expected  from  that  sequences  nucleotide 
content. 


ncRNA 

Class 

Level  of  Conservation 

CpG  Force 

MER123 

DNA_transposon 

Amniota 

1.1039 

HSATII 

SAT 

Primates 

1.036 

UC0N21 

Transposable_Element 

Amniota 

0.9465 

MER6B 

Mariner/Tcl 

Homo_spaiens 

0.923 

Eulorl 

Transposable_Element 

Amniota 

0.8481 

Eulor5B 

Transposable_Element 

Tetrapoda 

0.8474 

Eulor2C 

Transposable_Element 

Amniota 

0.7676 

Eulor6A 

Transposable_Element 

Tetrapoda 

0.7466 

MER131 

SINE 

Amniota 

0.6223 

Eulor4 

Transposable_Element 

Tetrapoda 

0.6067 

EulorlO 

Transposable_Element 

Amniota 

0.6064 

MER6C 

Mariner/Tcl 

Eutheria 

0.5667 

Eulorl2 

Transposable_Element 

Amniota 

0.5295 

MER5C1 

hAT 

Eutheria 

0.4582 

MER47B 

Mariner/Tcl 

Eutheria 

0.4518 

UCON39 

DNA_transposon 

Mammalia 

0.4443 

UCON16 

Transposable_Element 

Amniota 

0.4436 

Tigger3d 

Mariner/Tcl 

Primates 

0.4374 

TIGGER5A 

Mariner/Tcl 

Eutheria 

0.4212 

MER75 

DNA_transposon 

Homo_spaiens 

0.4134 

Tigger4a 

Mariner/Tcl 

Primates 

0.3815 

npiggy2_Mm 

piggyBac 

Microcebus_murinus 

0.3725 

MER58B 

hAT 

Eutheria 

0.3657 

Eulor6C 

Transposable_Element 

Tetrapoda 

0.3571 

Eulorll 

Transposable_Element 

Amniota 

0.3561 

UCON15 

Transposable_Element 

Amniota 

0.356 

Tigger2b_Pri 

Mariner/Tcl 

Primates 

0.3548 

MER44B 

Mariner/Tcl 

Homo_spaiens 

0.3536 

SUBTEL_sat 

Satellite 

Primates 

0.3527 

Eulor9A 

Transposable_Element 

Amniota 

0.3465 

MER44C 

Mariner/Tcl 

Homo_spaiens 

0.3439 

Eulor8 

Transposable_Element 

Amniota 

0.3416 

MER44D 

Mariner/Tcl 

Eutheria 

0.3211 

npiggyl_Mm 

piggyback 

Microcebus_murinus 

0.3131 

UCON26 

Transposable_Element 

Amniota 

0.2985 

MER127 

Mariner/Tcl 

Amniota 

0.2984 

MER97d 

hAT 

Eutheria 

0.2939 

Eulor6D 

Transposable_Element 

Tetrapoda 

0.2866 

Eulor2B 

Transposable_Element 

Amniota 

0.2852 

MER119 

hAT 

Homo_spaiens 

0.2794 

MER134 

Transposable_Element 

Amniota 

0.2786 

Eulor9C 

Transposable_Element 

Amniota 

0.2751 

MER8 

Mariner/Tcl 

Homo_spaiens 

0.2669 

Ricksha_a 

MuDR 

Eutheria 

0.2607 

MER129 

SINE 

Amniota 

0.2444 

MacERV6_LTR3 

ERV3 

Cercopithecidae 

0.2404 

MER57B2 

ERV1 

Homo_spaiens 

0.2403 

HSMAR1 

Mariner/Tcl 

Homo_spaiens 

0.2397 

Eulorl2_CM 

Transposable_Element 

Amniota 

0.2269 

MERX 

Mariner/Tcl 

Eutheria 

0.2207 

Tiggerl2A 

Mariner/Tcl 

Mammalia 

0.217 

MER58A 

hAT 

Eutheria 

0.2006 

Supplementary  Table  2.  Many  Repetitive  Elements  Have  High  CpG  Forces. 

Listed  above  are  the  repetitive  elements  from  Repbase  with  a  significantly  high  CpG 
force.  These  elements  are  typically  not  found  to  be  expressed  in  normal  tissue,  yet 
some  may  be  expressed  in  cancer  cells  and  cell  lines. 


